AIA fetching in Chrome for Android

767 views
Skip to first unread message

Emily Stark

unread,
Sep 21, 2016, 1:33:01 AM9/21/16
to net-dev, Ryan Sleevi, Eric Roman
Following some discussion with rsleevi@ and eroman@, I'd like to add intermediate fetching during Chrome's certificate verification on Android. From Safe Browsing Extended Reporting data, we estimate that a significant percentage of certificate errors are due to servers serving chains that omit the necessary intermediates. (Android does not fetch intermediates during certificate verification.)

I wrote up a doc explaining more about the problem and motivation, why we'd like to do the intermediate fetching from within Chrome on Android, and an attempt at a plan for implementing it: https://docs.google.com/document/d/1ryqFMSHHRDERg1jm3LeVt7VMfxtXXrI8p49gmtniNP0/edit#

As usual, all sorts of feedback is welcome and much appreciated. Thanks!
Emily

Chris Bentzel

unread,
Sep 21, 2016, 8:11:14 AM9/21/16
to Emily Stark, net-dev, Ryan Sleevi, Eric Roman
Excited to see this happen, especially given the stats listed at the top of the doc.
--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAPP_2SZ0OAwc7ou2mmCk3A21Jp2xrt0pQ83Bcc9XBU0jYEuwZw%40mail.gmail.com.

Ryan Sleevi

unread,
Sep 21, 2016, 12:25:15 PM9/21/16
to Chris Bentzel, Emily Stark, net-dev, Ryan Sleevi, Eric Roman
I left the comment on the doc, but I'm really uncomfortable putting this in CertVerifierJob.

MultiThreadedCertVerifier is meant to be what the name suggests - a platform agnostic CertVerifier that handles threading. This introduces a significant part of the path building logic into it, which we've tried very intentionally to constrain to CertVerifyProc::Verify (in the case of platform-agnostic logic, such as crypto policy and metrics) and CertVerifyProc::VerifyInternal (in the case of platform-specific logic).

Conceptually, the AIA path building is similar to the path-hacking we have to do on OS X to account for the fact that OS X doesn't do graph traversal of the chain, and thus it's necessary to "chop off" inputs that we hand to the OS, in the event the user has an improper Keychain entry that's defeating the path validation logic of the OS.

I'm not sure if it's easier on the doc or on this thread, but I'd be curious to know if putting this logic in CertVerifyProc::VerifyInternal was considered. Understandably, some of the lifetimes of objects can be complicated here, so I can understand why that may not be appealing.

Alternatively, if we said that CertVerifyProc is hopelessly complex (although the NSS code has to do something similar to this already in CVPNSS, and as mentioned, CVPMac  has to do chain correction for OS X), if you explored a separate, concrete CertVerifier instance that could be composed on Android, much in the same way that you can compose caching (CachingCertVerifier) and threading (MultiThreadedCertVerifier).

It would *seem* like, if it was necessary, an AIAChasingCertVerifier could be done that could interpose above the MultiThreaded (which speaks directly to a CertVerifyProc) and below caching, and, like caching, simply dispatch to the underlying CertVerifier implementation.

On Wed, Sep 21, 2016 at 5:11 AM, Chris Bentzel <cben...@chromium.org> wrote:
Excited to see this happen, especially given the stats listed at the top of the doc.
On Wed, Sep 21, 2016 at 1:33 AM Emily Stark <est...@chromium.org> wrote:
Following some discussion with rsleevi@ and eroman@, I'd like to add intermediate fetching during Chrome's certificate verification on Android. From Safe Browsing Extended Reporting data, we estimate that a significant percentage of certificate errors are due to servers serving chains that omit the necessary intermediates. (Android does not fetch intermediates during certificate verification.)

I wrote up a doc explaining more about the problem and motivation, why we'd like to do the intermediate fetching from within Chrome on Android, and an attempt at a plan for implementing it: https://docs.google.com/document/d/1ryqFMSHHRDERg1jm3LeVt7VMfxtXXrI8p49gmtniNP0/edit#

As usual, all sorts of feedback is welcome and much appreciated. Thanks!
Emily

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.

Emily Stark

unread,
Sep 21, 2016, 8:35:24 PM9/21/16
to Ryan Sleevi, Chris Bentzel, Emily Stark, net-dev, Eric Roman
Hey Ryan, I'll continue the discussion here on the thread since I find doc comment boxes tend to get a little constraining.

I was initially thinking about putting this logic in CertVerifyProc, and I talked with Eric a little bit about that off-list before I wrote up this proposal. He made an argument, which I found convincing, that worker threads shouldn't block on URLRequests happening on the IO thread. However, I hadn't seen nss_ocsp before (which you pointed to in the doc), and didn't realize that it does exactly that! I'm guessing maybe he hadn't seen that code either? Eric, do you feel differently about pursuing an approach where CertVerifyProc kicks off the AIA fetches on the IO thread given that we already do basically the same thing for HTTP requests that NSS needs?

(All that said, even if we agree that CertVerifyProc can kick off and wait on the fetches on the IO thread, I haven't sketched out that approach in detail yet so it might still turn out to be very hairy for other reasons, in which case I would explore the idea of composing an AIAChasingCertVerifier.)

Thanks,
Emily

Ryan Sleevi

unread,
Sep 21, 2016, 8:47:03 PM9/21/16
to Emily Stark, Ryan Sleevi, Chris Bentzel, net-dev, Eric Roman
On Wed, Sep 21, 2016 at 5:35 PM, Emily Stark <est...@chromium.org> wrote:
Hey Ryan, I'll continue the discussion here on the thread since I find doc comment boxes tend to get a little constraining.

I was initially thinking about putting this logic in CertVerifyProc, and I talked with Eric a little bit about that off-list before I wrote up this proposal. He made an argument, which I found convincing, that worker threads shouldn't block on URLRequests happening on the IO thread. However, I hadn't seen nss_ocsp before (which you pointed to in the doc), and didn't realize that it does exactly that! I'm guessing maybe he hadn't seen that code either? Eric, do you feel differently about pursuing an approach where CertVerifyProc kicks off the AIA fetches on the IO thread given that we already do basically the same thing for HTTP requests that NSS needs?

Right, my uncertainty around your design was because I wasn't sure if you'd considered this and intentionally ruled it out or just weren't familiar with it (since it is subtle)

There's another catch to the NSS design, which, while primarily driven by NSS requirements, might have been a reason for you to design the way you did. NSS sets a global HTTP context for satisfying requests, which means that we can't associate the HTTP client with each CertVerifyProc. As a consequence, we need a 'global' URLRequestContext that's available, which means the SystemURLRequestContext (in effect). The implications of this for Linux/ChromeOS are that any user-defined proxy settings (such as via extensions or cloud policies) are left on the user's URLRequestContext, and don't propagate to the SystemURLRequestContext (which is user-agnostic), which means that some AIA fetches fail.

So, these are two downsides (blocking worker threads, using the system URL request context), but it's unclear whether they're big enough to warrant breaking the CertVerifier threading abstraction. They may be, but my gut suggests that they're reasonable tradeoffs (that is, we're happy with them for Linux/ChromeOS for now). Given the tradeoffs we've also had to make on other platforms (e.g. OS X CertVerifyProc's are synchronized behind a Big Global Lock, because Security.framework is not reliably thread-safe across all versions of OS X, CryptoAPI on Windows internally synchronizes some operations), I'm not too worried about the performance tradeoff here, especially if it's primarily for pathological cases that would otherwise not work. While I would love to believe that the performance hit would be the incentive for sites to send different chains, if these sites were paying attention to their engagement and usage metrics, they presumably would have fixed these issues. Still, we might see some small gains.

(All that said, even if we agree that CertVerifyProc can kick off and wait on the fetches on the IO thread, I haven't sketched out that approach in detail yet so it might still turn out to be very hairy for other reasons, in which case I would explore the idea of composing an AIAChasingCertVerifier.)

Sounds good. There may still be some performance hits with that approach too (due to the additional work in a CertVerifier), but in both cases, I think they'd be amortized by Android's internal intermediate caching (which is already a point of lock synchronization), which should reduce the number of times we'd have to hit the unhappy path. 

Emily Stark

unread,
Sep 22, 2016, 2:01:23 AM9/22/16
to Ryan Sleevi, Emily Stark, Chris Bentzel, net-dev, Eric Roman
On Wed, Sep 21, 2016 at 5:46 PM, Ryan Sleevi <rsl...@chromium.org> wrote:


On Wed, Sep 21, 2016 at 5:35 PM, Emily Stark <est...@chromium.org> wrote:
Hey Ryan, I'll continue the discussion here on the thread since I find doc comment boxes tend to get a little constraining.

I was initially thinking about putting this logic in CertVerifyProc, and I talked with Eric a little bit about that off-list before I wrote up this proposal. He made an argument, which I found convincing, that worker threads shouldn't block on URLRequests happening on the IO thread. However, I hadn't seen nss_ocsp before (which you pointed to in the doc), and didn't realize that it does exactly that! I'm guessing maybe he hadn't seen that code either? Eric, do you feel differently about pursuing an approach where CertVerifyProc kicks off the AIA fetches on the IO thread given that we already do basically the same thing for HTTP requests that NSS needs?

Right, my uncertainty around your design was because I wasn't sure if you'd considered this and intentionally ruled it out or just weren't familiar with it (since it is subtle)

There's another catch to the NSS design, which, while primarily driven by NSS requirements, might have been a reason for you to design the way you did. NSS sets a global HTTP context for satisfying requests, which means that we can't associate the HTTP client with each CertVerifyProc. As a consequence, we need a 'global' URLRequestContext that's available, which means the SystemURLRequestContext (in effect). The implications of this for Linux/ChromeOS are that any user-defined proxy settings (such as via extensions or cloud policies) are left on the user's URLRequestContext, and don't propagate to the SystemURLRequestContext (which is user-agnostic), which means that some AIA fetches fail.

So, these are two downsides (blocking worker threads, using the system URL request context), but it's unclear whether they're big enough to warrant breaking the CertVerifier threading abstraction. They may be, but my gut suggests that they're reasonable tradeoffs (that is, we're happy with them for Linux/ChromeOS for now).

Just want to check that I understand: for the system URLRequestContext issue, it seems like that's imposed by the NSS API, but in implementing AIA fetching for android, we could use the profile URLRequestContext if we so desire. Is that right, or am I misunderstanding? (That is, my understanding is that CVPAndroid could have a pointer to a CertNetFetcher which uses the profile URC, whereas in the NSS case, there's a single function that gets called to make an HTTP request so it must use a global URC.)

Ryan Sleevi

unread,
Sep 22, 2016, 2:23:08 AM9/22/16
to Emily Stark, Ryan Sleevi, Chris Bentzel, net-dev, Eric Roman
On Wed, Sep 21, 2016 at 11:01 PM, Emily Stark <est...@chromium.org> wrote:
Just want to check that I understand: for the system URLRequestContext issue, it seems like that's imposed by the NSS API, but in implementing AIA fetching for android, we could use the profile URLRequestContext if we so desire. Is that right, or am I misunderstanding? 

I don't think we can, but that's my fault for not explaining why more.

Because CertVerifier::CreateDefault() (and CertVerifyProc::CreateDefault()) exist, there's no way to inject dependencies into either all that well. This is somewhat similar to our ProxyResolver woes - we either smuggle unnecessary parameters for (most) platforms, as we do with the UI context - or we have to accept that our dependency chain is 'hidden' via globals (which is, in effect, how the NSS SystemURLRequestContext is accessed - NSS's HTTP client is an internal-to-NSS global pointer)

This aspect is true whether we're talking CertVerifier, CertVerifyJob, or CertVerifyProc - there's no good pivot to add that dependency in a clean way.

Even an AIACertVerifier might be tricky, because if you added something like AIACertVerifier::SetCertNetFetcher(), you'd still need access to the AIACertVerifier as-an AIACertVerifier to be able to call that method (rather than as a CertVerifier*, which loses the type specifier and thus the method), and I don't believe you'd have many good places where the profile URLRequestContext (which needs a CertVerifier) is created, and can then give back the AIACertVerifier used to initialize the URLRequestContext.

While the use of SystemURLRequestContext (and the implicit or explicit global that entails) is gross in a way that also makes me uncomfortable, I don't have any good suggestions for avoiding it for any of the solutions, and it's not the end of the world. Given that Android doesn't have to worry about extensions mucking about with proxy settings, and AFAIK, we take the system proxy settings rather than any cloud-profile proxy settings, I think we should be fine with something like the NSS code?

Emily Stark

unread,
Sep 22, 2016, 2:36:19 AM9/22/16
to Ryan Sleevi, Emily Stark, Chris Bentzel, net-dev, Eric Roman
Ohh, I think I see what you're saying, thanks. I'll think on this a bit more with a fresher brain tomorrow to make sure I understand, but I think I see what you're getting at. (And I agree that it's not the end of the world -- I suspect that AIA-fetches-with-the-system-URC will still be a vast, vast improvement for users over no-AIA-fetches-at-all.)

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.
To post to this group, send email to net...@chromium.org.

Brian Smith

unread,
Sep 22, 2016, 8:05:05 PM9/22/16
to Emily Stark, net-dev, Ryan Sleevi, Eric Roman
Could you also share the documentation of the alternatives to AIA fetching you explored and why they are unacceptable? Also, what are the reasons that has Android teams have given for avoiding AIA fetching for so long?

I think everybody agrees that false positive certificate errors are bad and that it is worth spending effort to avoid them.

As is typical in any feature which exists to wallpaper over a problem instead of solving it, the AIA fetching mechanism has several downsides as well. In particular, such fallback mechanisms actually encourage people to leave their servers misconfigured since the misconfiguration will be "magically" fixed by the browsers. Thus, realistically, we should expect that this change will increase the number of (mobile) websites that are misconfigured. Then we'd be in a vicious spiral where even more implementations would feel compelled to implement AIA fetching to be compatible with Chrome for Android, which would encourage even more websites to be misconfigured, ad infinitum. For this reason, I think it is worth trying to find an alternative that has positive short-term for Chrome for Android *and* positive long-term effects on the web as a whole.

In the document, you write "Using data from Chrome’s Safe Browsing Extended Reporting program, we estimate that server chains with incorrect or missing intermediates account for >10% of all certificate validation errors in Chrome, and >30% of all certificate validation errors that occur in Chrome for Android. About 90% of the errors caused by missing or misconfigured intermediates occur on Android."
 Could you explain how these numbers 10%, 30%, and 90% are calculated? Are the 10%/30%/90% numbers indicative of the number of times that users see the certificate error page, or are they indicative of the number of pageloads with such certificate errors, or are they indicative of the number of TLS connections with such certificate errors, or HTTP requests on TLS connections with such certificate errors, or something else?

In particular, do you expect that this change will account for a 10%/30% reduction in certificate error pages seen by users? And, if this change isn't expected to improve things by the full 10%/30%, then what is expected improvement?

FWIW, my hypothesis, which I haven't gotten around to verifying or refuting with an experiment, is that a small number of intermediates likely account for a huge percentage of the problem, and that if the browser simply shipped with a small number (say, less than 100) of these intermediates, without doing AIA fetching on any platform, the problem would mostly go away, while minimizing the negative effect of encouraging misconfiguration over time. I think it would be great to have a browser try an experiment along these lines.

Cheers,
Brian

Ryan Sleevi

unread,
Sep 22, 2016, 8:11:40 PM9/22/16
to Brian Smith, Emily Stark, net-dev, Ryan Sleevi, Eric Roman
On Thu, Sep 22, 2016 at 5:04 PM, Brian Smith <br...@briansmith.org> wrote:
As is typical in any feature which exists to wallpaper over a problem instead of solving it, the AIA fetching mechanism has several downsides as well.

Hi Brian,

I think I'd have to disagree with this characterization of AIA. I realize we may simply disagree, but as discussed with the networking team, AIA fetching represents an important aspect of PKI mobility and changes, much like root autoupdates do. While you present it as "papering over a problem", it's equally fair (and perhaps more accurate) to highlight that it allows for PKI transitions to happen independently, without centralized coordination and management.
 
For this reason, I think it is worth trying to find an alternative that has positive short-term for Chrome for Android *and* positive long-term effects on the web as a whole.

As stated above, some of us believe that AIA fetching represents an important aspect of ecosystem health, and that all PKI clients should implement this for a robustness.

While I know you have experience with Firefox deciding not to fetch intermediates, it's also clear that Mozilla's decisions behind that have necessitated trusting CAs well beyond when it's advisable. This problem has equally affected other consumers of the Mozilla Root Store, such as RHEL.
 
FWIW, my hypothesis, which I haven't gotten around to verifying or refuting with an experiment, is that a small number of intermediates likely account for a huge percentage of the problem, and that if the browser simply shipped with a small number (say, less than 100) of these intermediates, without doing AIA fetching on any platform, the problem would mostly go away, while minimizing the negative effect of encouraging misconfiguration over time. I think it would be great to have a browser try an experiment along these lines.

I believe the Web PKI is best suited when avoiding such centrally managed solutions. Perhaps this may change in time, particularly as the industry moves to better disclosure of intermediates in a technically discoverable way. However, the ability to AIA fetch is presently an important part in ensuring a robust, minimally disruptive PKI.

We may simply have to agree to disagree about the value of AIA.

Ryan Sleevi

unread,
Sep 22, 2016, 9:04:34 PM9/22/16
to Ryan Sleevi, Brian Smith, Emily Stark, net-dev, Eric Roman


On Thu, Sep 22, 2016 at 5:10 PM, Ryan Sleevi <rsl...@chromium.org> wrote:
We may simply have to agree to disagree about the value of AIA.

It was pointed out that I wasn't clear about why I disagree.

Consider the following site - https://crt.sh - which is the go-to site for searching Certificate Transparency Logs.

It has an A+ on SSLLabs [1], which reflects a well-considered configuration of TLS.

If you examine the SSLLabs results for any of that server's IPs [2], you'll see multiple Certification Paths. The Server is sending two certificates - a98b0cb41b3c9bb8ea5f7a4ed9ba3357bd94d55e [3] and 1f365c20e52ad2a6b09020a0e5539759c98df8d0 [4].

My assertion, and the disagreement with Brian, is whether the server's current configuration represents a "misconfiguration". Brian's description of AIA suggests that, because it is omitting a third possible intermediate [5], and instead relying on the client to trust the version that is self-signed [6], it is misconfigured.

Brian's suggestion (and apologies, Brian, if I'm mischaracterizing this) is that an alternative to fetching [5] on demand would be to ship it in the binary. This is necessary whenever a client doesn't have [6] trusted, and instead has a different cert trusted [7].

I reject the claim that a server, such as crt.sh, is misconfigured, simply because it supplies [3],[4] and doesn't send [5]. A server cannot know whether a client trusts [6] or [7], and it cannot know when [5] is necessary or not. Different clients may trust [6] or [7], dependent upon a variety of reasons, many of which I personally believe are necessary for a healthy PKI ecosystem (such as the deprecation of older algorithms or key sizes).

Similarly, using the criteria established, we would also need to suggest that google.com is misconfigured, because it supplies an additional intermediate, [9]. However, this intermediate is supplied precisely to ensure Google services work with older devices, which do not intrinsically trust the alternative version of [9], which is [10]. [10] is noted as the "Trust Anchor" within SSLLabs because it derives its trust chains based on a relatively updated version of the Mozilla Trust Store. However, there exist a variety of trust stores, and a variety of versions of them, as seen in projects like [11], and so there isn't a "one size fits all" configuration.

As such, calling this sorts of things "misconfigurations" are an unnecessary duality; it's neither, or both, or somewhere in between.

While Brian's absolutely correct that AIA also covers misconfigurations, such as a server only supplying a single certificate, I don't agree with causing users pain and conditioning them on errors simply for an ideological stand. Past attempts at such stands, such as refusing to persist TLS errors, resulted in much worse experiences for Chrome users.

The suggestion of shipping a prepackaged list of intermediates is certainly technically viable, but has overheads of size (both on disk and in memory), and represents a static, centrally managed configuration, which as a consequence, is unable to readily evolve to the Web's needs. While such a solution may be more tenable in the future, due to many investments that Google is doing in to bringing greater transparency to the Web PKI, these are forward looking things; we must still deal with the problem we have today using the technology we have today.

Emily's proposal takes a neutral approach, reflects Chrome's behaviour on every other platform, and is independently extensible, while Google (and the Chrome team) continue to invest in better solutions, in collaboration with a variety of industry partners, including other major browser vendors.

It's hardly ideal, but certainly, within the world today and the immediate future, it represents a balanced tradeoff of remaining vendor-agnostic and ensuring Chrom(e/ium) users have the best possible experience.

Emily Stark

unread,
Sep 22, 2016, 9:11:48 PM9/22/16
to Brian Smith, Emily Stark, net-dev, Ryan Sleevi, Eric Roman
Hi Brian, thanks for your email. I hadn't heard some of the arguments you make before and appreciate hearing this different perspective. In addition to the replies Ryan's making about the core disagreement, I want to answer some of your specific questions -- answers inline.

On Thu, Sep 22, 2016 at 5:04 PM, Brian Smith <br...@briansmith.org> wrote:
Emily Stark <est...@chromium.org> wrote:
Following some discussion with rsleevi@ and eroman@, I'd like to add intermediate fetching during Chrome's certificate verification on Android. From Safe Browsing Extended Reporting data, we estimate that a significant percentage of certificate errors are due to servers serving chains that omit the necessary intermediates. (Android does not fetch intermediates during certificate verification.)

I wrote up a doc explaining more about the problem and motivation, why we'd like to do the intermediate fetching from within Chrome on Android, and an attempt at a plan for implementing it: https://docs.google.com/document/d/1ryqFMSHHRDERg1jm3LeVt7VMfxtXXrI8p49gmtniNP0/edit#

As usual, all sorts of feedback is welcome and much appreciated. Thanks!

Could you also share the documentation of the alternatives to AIA fetching you explored and why they are unacceptable? Also, what are the reasons that has Android teams have given for avoiding AIA fetching for so long?

I'm hesitant to speak for another team and it's a bit difficult to answer this without doing so. I guess one thing to point out is that I haven't personally heard the arguments you make below as one of the reasons that Android doesn't do AIA fetching. To me, the important thing is slow release/update cycle I mentioned in the doc. Even if Android goes all-in on AIA fetching tomorrow, we'd probably still want it in Chrome until the Android implementation reaches enough users.

Independent of Android, we've talked about doing outreach to site owners: contact the top N sites that have misconfigured intermediates, do a notification in Webmaster Tools, something in DevTools, etc. I think those are things worth exploring in parallel.


I think everybody agrees that false positive certificate errors are bad and that it is worth spending effort to avoid them.

As is typical in any feature which exists to wallpaper over a problem instead of solving it, the AIA fetching mechanism has several downsides as well. In particular, such fallback mechanisms actually encourage people to leave their servers misconfigured since the misconfiguration will be "magically" fixed by the browsers. Thus, realistically, we should expect that this change will increase the number of (mobile) websites that are misconfigured. Then we'd be in a vicious spiral where even more implementations would feel compelled to implement AIA fetching to be compatible with Chrome for Android, which would encourage even more websites to be misconfigured, ad infinitum. For this reason, I think it is worth trying to find an alternative that has positive short-term for Chrome for Android *and* positive long-term effects on the web as a whole.

In the document, you write "Using data from Chrome’s Safe Browsing Extended Reporting program, we estimate that server chains with incorrect or missing intermediates account for >10% of all certificate validation errors in Chrome, and >30% of all certificate validation errors that occur in Chrome for Android. About 90% of the errors caused by missing or misconfigured intermediates occur on Android."
 Could you explain how these numbers 10%, 30%, and 90% are calculated? Are the 10%/30%/90% numbers indicative of the number of times that users see the certificate error page, or are they indicative of the number of pageloads with such certificate errors, or are they indicative of the number of TLS connections with such certificate errors, or HTTP requests on TLS connections with such certificate errors, or something else?


Chrome sends a report every time an opted-in user sees the certificate error page, and the 10%/30%/90% is the percentage of those reports that we estimate are due to misconfigured intermediates.
 
In particular, do you expect that this change will account for a 10%/30% reduction in certificate error pages seen by users? And, if this change isn't expected to improve things by the full 10%/30%, then what is expected improvement?

I'm not sure if the above explanation answers this or not? Conceptually, I'd expect a 30% reduction in certificate error pages seen by Android Chrome users. Of course there are all sorts of reasons that it might not turn out that way: our opted-in users might not be representative of the population as a whole, AIA fetches might fail for users on flaky networks, the implementation might be too slow and people might close the tab before it can complete, etc. etc.

Emily Stark

unread,
Sep 22, 2016, 9:20:11 PM9/22/16
to Emily Stark, Brian Smith, net-dev, Ryan Sleevi, Eric Roman
On Thu, Sep 22, 2016 at 6:11 PM, Emily Stark <est...@chromium.org> wrote:
Hi Brian, thanks for your email. I hadn't heard some of the arguments you make before and appreciate hearing this different perspective. In addition to the replies Ryan's making about the core disagreement, I want to answer some of your specific questions -- answers inline.

On Thu, Sep 22, 2016 at 5:04 PM, Brian Smith <br...@briansmith.org> wrote:
Emily Stark <est...@chromium.org> wrote:
Following some discussion with rsleevi@ and eroman@, I'd like to add intermediate fetching during Chrome's certificate verification on Android. From Safe Browsing Extended Reporting data, we estimate that a significant percentage of certificate errors are due to servers serving chains that omit the necessary intermediates. (Android does not fetch intermediates during certificate verification.)

I wrote up a doc explaining more about the problem and motivation, why we'd like to do the intermediate fetching from within Chrome on Android, and an attempt at a plan for implementing it: https://docs.google.com/document/d/1ryqFMSHHRDERg1jm3LeVt7VMfxtXXrI8p49gmtniNP0/edit#

As usual, all sorts of feedback is welcome and much appreciated. Thanks!

Could you also share the documentation of the alternatives to AIA fetching you explored and why they are unacceptable? Also, what are the reasons that has Android teams have given for avoiding AIA fetching for so long?

I'm hesitant to speak for another team and it's a bit difficult to answer this without doing so. I guess one thing to point out is that I haven't personally heard the arguments you make below as one of the reasons that Android doesn't do AIA fetching. To me, the important thing is slow release/update cycle I mentioned in the doc. Even if Android goes all-in on AIA fetching tomorrow, we'd probably still want it in Chrome until the Android implementation reaches enough users.

Independent of Android, we've talked about doing outreach to site owners: contact the top N sites that have misconfigured intermediates, do a notification in Webmaster Tools, something in DevTools, etc. I think those are things worth exploring in parallel.


I think everybody agrees that false positive certificate errors are bad and that it is worth spending effort to avoid them.

As is typical in any feature which exists to wallpaper over a problem instead of solving it, the AIA fetching mechanism has several downsides as well. In particular, such fallback mechanisms actually encourage people to leave their servers misconfigured since the misconfiguration will be "magically" fixed by the browsers. Thus, realistically, we should expect that this change will increase the number of (mobile) websites that are misconfigured. Then we'd be in a vicious spiral where even more implementations would feel compelled to implement AIA fetching to be compatible with Chrome for Android, which would encourage even more websites to be misconfigured, ad infinitum. For this reason, I think it is worth trying to find an alternative that has positive short-term for Chrome for Android *and* positive long-term effects on the web as a whole.

In the document, you write "Using data from Chrome’s Safe Browsing Extended Reporting program, we estimate that server chains with incorrect or missing intermediates account for >10% of all certificate validation errors in Chrome, and >30% of all certificate validation errors that occur in Chrome for Android. About 90% of the errors caused by missing or misconfigured intermediates occur on Android."
 Could you explain how these numbers 10%, 30%, and 90% are calculated? Are the 10%/30%/90% numbers indicative of the number of times that users see the certificate error page, or are they indicative of the number of pageloads with such certificate errors, or are they indicative of the number of TLS connections with such certificate errors, or HTTP requests on TLS connections with such certificate errors, or something else?


Chrome sends a report every time an opted-in user sees the certificate error page, and the 10%/30%/90% is the percentage of those reports that we estimate are due to misconfigured intermediates.

Oh, another thing I should mention is that these estimates are based on heuristics, which I definitely don't think are perfect; it's difficult to analyze automatically, partly because of the reasons Ryan mentioned that it's not exactly cut-and-dry what defines a "misconfiguration." So these really are estimates, not exact figures. We've done manual inspections of random samples to verify that the estimates are in the right ballpark.

Brian Smith

unread,
Sep 22, 2016, 9:22:30 PM9/22/16
to Ryan Sleevi, Emily Stark, net-dev, Eric Roman
Ryan Sleevi <rsl...@chromium.org> wrote:
Brian Smith <br...@briansmith.org> wrote:
As is typical in any feature which exists to wallpaper over a problem instead of solving it, the AIA fetching mechanism has several downsides as well.

Hi Brian,

I think I'd have to disagree with this characterization of AIA. I realize we may simply disagree,

I definitely am biased towards finding alternative solutions that avoid AIA fetching, much like Chrome has spent considerable effort to find alternative solutions to OCSP fetching. I personally haven't made up my mind on the issue because I feel there isn't sufficient evidence to make a conclusion. So, I disagree that we must agree to disagree so quickly; I think if everybody is open-minded then we could agree to try to agree to agree.

AIA fetching represents an important aspect of PKI mobility and changes, much like root autoupdates do.

I think we agree that root autoupdates are good, so we're off to a good start towards agreeing to agree already. My suggestion is that people experiment with downloading the intermediates that are commonly missing from the same place that roots are already downloaded from and secured using the same mechanisms, whereas the AIA fetching mechanism proposes to download them from an arbitrary site an attacker or perhaps even a legit peer asks us to download them from. My suggestion has the benefit that it can be (but doesn't have to be) implemented over HTTPS, which would work for clients that don't have a non-HTTPS HTTP stack at all, whereas the AIA fetching mechanism requires non-HTTPS HTTP capabilities in the client.
 
While you present it as "papering over a problem", it's equally fair (and perhaps more accurate) to highlight that it allows for PKI transitions to happen independently, without centralized coordination and management.

Could you make this more concrete? What do you mean, exactly, by a PKI transition? Could you provide a concrete example? Again, as far as I understand the original proposal, enabling such agility was not the problem that the proposal is trying to solve, so it might not matter anyway.
 
 
For this reason, I think it is worth trying to find an alternative that has positive short-term for Chrome for Android *and* positive long-term effects on the web as a whole.

As stated above, some of us believe that AIA fetching represents an important aspect of ecosystem health, and that all PKI clients should implement this for a robustness.

I am sure that you genuinely believe this is true. But, I am still not convinced that it is necessary. If all HTTPS clients are going to be pushed towards implementing this, then I think it is important that the motivation is stated clearly.

While I know you have experience with Firefox deciding not to fetch intermediates,
 
Thank you for pointing out that Firefox doesn't implement AIA fetching at all; I wasn't sure if that would be considered an important factor in this discussion, so I didn't mention it before.

FWIW, it appears from reading Firefox's bug database that they're still open to the idea of preloading the intermediate certificates before they decide to implement AIA fetching. (That is based on my reading of decisions made by people other than me, after I left, so I might be misunderstanding them.)
 
it's also clear that Mozilla's decisions behind that have necessitated trusting CAs well beyond when it's advisable.

I doubt this is true, but I don't follow Mozilla stuff as closely as you seem to. What specific cases do you have in mind?
 
This problem has equally affected other consumers of the Mozilla Root Store, such as RHEL.

Red Hat doesn't even use the same certificate verification code as Firefox, so if Red Hat also does not enable AIA fetching in their products then that is their own independent choice.
 
 
FWIW, my hypothesis, which I haven't gotten around to verifying or refuting with an experiment, is that a small number of intermediates likely account for a huge percentage of the problem, and that if the browser simply shipped with a small number (say, less than 100) of these intermediates, without doing AIA fetching on any platform, the problem would mostly go away, while minimizing the negative effect of encouraging misconfiguration over time. I think it would be great to have a browser try an experiment along these lines.

I believe the Web PKI is best suited when avoiding such centrally managed solutions. Perhaps this may change in time, particularly as the industry moves to better disclosure of intermediates in a technically discoverable way.

This is my main point: The more implementations that implement AIA fetching now, the harder it will be to drop it later in favor of better alternatives if/when they become viable.
 

We may simply have to agree to disagree about the value of AIA.

Again, I hope not!

Regardless, I think it would be helpful to other implementers if Google shared the data it has gathered on this issue in a way that is useful, along with whatever useful feedback it may have on other potential solutions.

Cheers,
Brian
--

Ryan Sleevi

unread,
Sep 22, 2016, 9:40:58 PM9/22/16
to Brian Smith, Ryan Sleevi, Emily Stark, net-dev, Eric Roman
On Thu, Sep 22, 2016 at 6:22 PM, Brian Smith <br...@briansmith.org> wrote:
My suggestion is that people experiment with downloading the intermediates that are commonly missing from the same place that roots are already downloaded from and secured using the same mechanisms, whereas the AIA fetching mechanism proposes to download them from an arbitrary site an attacker or perhaps even a legit peer asks us to download them from.

While true, this doesn't materially change the threat model, as browser clients must already be capable of loading arbitrary certificates (the sites), and capable of following arbitrary resources (after all, this is Hypertext).

That is, to a browser client, an AIA fetch is conceptually quite similar to fetching JQuery from a CDN (secured with SRI).

I can understand the concern for non-browser clients, and the potential ecosystem value, but I think if we entertain such clients, we're going to need to more carefully evaluate on a client-by-client basis. At the end of the day, we certainly need to ensure we're at least doing what's best for Chromium users.
 
Could you make this more concrete? What do you mean, exactly, by a PKI transition? Could you provide a concrete example? Again, as far as I understand the original proposal, enabling such agility was not the problem that the proposal is trying to solve, so it might not matter anyway.

As Emily clarified, we see both - things that are much more clearly misconfigurations (such as leaf-only certs), but we also see that the variety of root stores across Android revisions - which, unfortunately, don't autoupdate - need the ability to handle transitions. A PKI transition can be observed in both sites I posted - crt.sh (the transition away from AddTrust) and google.com (the transition away from Equifax). In both cases, new roots are stood up, which are supported on new clients, but older clients need older intermediates.

When sites are going through such transitions, they have to choose who the 'default' configuration works for. When a new version of Android is released, for example, the site likely can't rely on that version reaching 100% ubiquity within days, and thus needs to continue to supply the 'old' chain to the root. However, it equally can't wait for 100% ubiquity of the new version before it may decide to stop sending the older chain - even if that's what Google has, in effect, been doing.

As you know, this is complicated all the more during the transition from, say, RSA-1024 bit to RSA-2048 bit. The lack of AIA fetching, for example, has required applications running on RHEL (as well as some versions of Android) to support RSA-1024 bit roots, to facilitate the transition, even when it's completely undesirable to support 1024-bit keys *and* paths of purely 2048-bits are possible.
 
I am sure that you genuinely believe this is true. But, I am still not convinced that it is necessary. If all HTTPS clients are going to be pushed towards implementing this, then I think it is important that the motivation is stated clearly.

While it sounds like you're positioning this as a Chrome move that pushes the ecosystem one way, it's worth noting this is true on most platforms and browsers. The exception, rather than the rule, is Chrome on Android and Firefox. Firefox's stance is largely possible because they consider changes only relevant to the current version of Firefox - older root stores are not considered, asymmetric release schedules (such as OS vs root store) are not considered, root autoupdate (or the lack thereof) is not configured. While it's elegant in some ways, it unfortunately does not reflect where the broader ecosystem - of browsers and non-browser TLS clients - are.
 
FWIW, it appears from reading Firefox's bug database that they're still open to the idea of preloading the intermediate certificates before they decide to implement AIA fetching. (That is based on my reading of decisions made by people other than me, after I left, so I might be misunderstanding them.)

Indeed, and in a forward-thinking world in which all paths are permuted, and all versions of the root store is known (as in the case of Firefox), this is perhaps tenable. However, clients such as Chrome - which execute on a variety of platforms which may have disjoint stores - the notion of a 'one size fits all' solution is not consistent with how the WebPKI historically works or, I would argue, is desired to work. At a minimum, the notion of an 'intermediate bundle' is necessary per the set of trusted CAs (to be minimal), or contains unrelated or perhaps conflicting intermediates (if using a single union)
 
I doubt this is true, but I don't follow Mozilla stuff as closely as you seem to. What specific cases do you have in mind?

Equifax
 
Red Hat doesn't even use the same certificate verification code as Firefox, so if Red Hat also does not enable AIA fetching in their products then that is their own independent choice.

While this may be true, above you make the argument for 'the ecosystem', and so it's important to consider where the ecosystem is.
 
This is my main point: The more implementations that implement AIA fetching now, the harder it will be to drop it later in favor of better alternatives if/when they become viable.

I think we might disagree here, but ultimately, it depends on the objectives. I believe you're attaching a distinction to different types of configurations - one, such as only supplying the leaf cert, is 'bad' and the 'servers fault', and so displaying a warning is entirely appropriate to force it to send additional certs - while the other, the aforementioned root store disharmony, is 'good', and something the client should fix (perhaps with better alternatives).

I don't believe that in the system we have, we can reliably or meaningfully distinguish those two in such a way that it will result in a positive outcome, and it's better to overcorrect for the ecosystem and minimize any user interstitials, then it is to take the ideological approach and refuse to fix.

I'm incredibly sympathetic to the ecosystem arguments, but as I mentioned in the other mail, I believe that some form of AIA fetching is a net-win for all clients, properly optimized (such as with proper caching). And once that feature is introduced, the ability to distinguish 'bad' from 'good' is necessarily lost, and I don't see that as a bad thing either, which may be our point of disagreement.

Brian Smith

unread,
Sep 22, 2016, 11:54:36 PM9/22/16
to Emily Stark, net-dev, Ryan Sleevi, Eric Roman
Thanks for your helpful reply, Emily. Replies inline.

Emily Stark <est...@chromium.org> wrote:
On Thu, Sep 22, 2016 at 5:04 PM, Brian Smith <br...@briansmith.org> wrote:

Could you also share the documentation of the alternatives to AIA fetching you explored and why they are unacceptable? Also, what are the reasons that has Android teams have given for avoiding AIA fetching for so long?

I'm hesitant to speak for another team and it's a bit difficult to answer this without doing so. I guess one thing to point out is that I haven't personally heard the arguments you make below as one of the reasons that Android doesn't do AIA fetching. To me, the important thing is slow release/update cycle I mentioned in the doc. Even if Android goes all-in on AIA fetching tomorrow, we'd probably still want it in Chrome until the Android implementation reaches enough users.

I also don't know why Android hasn't done AIA fetching up to this point, or why Chrome for Android doesn't do AIA fetching up to this point. I do seem to remember it being said years ago (yes, I've been discussing this issue with people for years) that it was a conscious decision to *not* implement AIA fetching on Android, but I don't think any specific reasons were given then either. Regardless, I was hoping that some of the reasons for not doing AIA fetching must have been brought up in some discussion.

Independent of Android, we've talked about doing outreach to site owners: contact the top N sites that have misconfigured intermediates, do a notification in Webmaster Tools, something in DevTools, etc. I think those are things worth exploring in parallel.

These seem like useful things, though I would guess that they won't change the state of things too much on Android unless you simulate Android's certificate validation logic and root store(s) in the Chrome developer tools, which seems like a lot of work relative to the effectiveness I would expect it to have.
 

In the document, you write "Using data from Chrome’s Safe Browsing Extended Reporting program, we estimate that server chains with incorrect or missing intermediates account for >10% of all certificate validation errors in Chrome, and >30% of all certificate validation errors that occur in Chrome for Android. About 90% of the errors caused by missing or misconfigured intermediates occur on Android."
 Could you explain how these numbers 10%, 30%, and 90% are calculated? Are the 10%/30%/90% numbers indicative of the number of times that users see the certificate error page, or are they indicative of the number of pageloads with such certificate errors, or are they indicative of the number of TLS connections with such certificate errors, or HTTP requests on TLS connections with such certificate errors, or something else?


Chrome sends a report every time an opted-in user sees the certificate error page, and the 10%/30%/90% is the percentage of those reports that we estimate are due to misconfigured intermediates.

Usually when we look at the compatibility impact of a change, we do it in terms of percentage of total pageloads or similar. For example, in the "Intent to Deprecate/Remote" emails, there is almost always a use count figure quoted that shows the compatibility impact is minimal. In order to help convert the numbers 10%/30% into comparable figures, could you share the total percentage of pageloads that result in any kind of certificate error page?
 
 
In particular, do you expect that this change will account for a 10%/30% reduction in certificate error pages seen by users? And, if this change isn't expected to improve things by the full 10%/30%, then what is expected improvement?

I'm not sure if the above explanation answers this or not? Conceptually, I'd expect a 30% reduction in certificate error pages seen by Android Chrome users.

Yes, you answered exactly the question I had. Thanks!

Cheers,
Brian
--

Emily Stark

unread,
Sep 23, 2016, 2:17:53 PM9/23/16
to Brian Smith, Emily Stark, net-dev, Ryan Sleevi, Eric Roman
On Thu, Sep 22, 2016 at 8:54 PM, Brian Smith <br...@briansmith.org> wrote:
Thanks for your helpful reply, Emily. Replies inline.

Emily Stark <est...@chromium.org> wrote:
On Thu, Sep 22, 2016 at 5:04 PM, Brian Smith <br...@briansmith.org> wrote:

Could you also share the documentation of the alternatives to AIA fetching you explored and why they are unacceptable? Also, what are the reasons that has Android teams have given for avoiding AIA fetching for so long?

I'm hesitant to speak for another team and it's a bit difficult to answer this without doing so. I guess one thing to point out is that I haven't personally heard the arguments you make below as one of the reasons that Android doesn't do AIA fetching. To me, the important thing is slow release/update cycle I mentioned in the doc. Even if Android goes all-in on AIA fetching tomorrow, we'd probably still want it in Chrome until the Android implementation reaches enough users.

I also don't know why Android hasn't done AIA fetching up to this point, or why Chrome for Android doesn't do AIA fetching up to this point. I do seem to remember it being said years ago (yes, I've been discussing this issue with people for years) that it was a conscious decision to *not* implement AIA fetching on Android, but I don't think any specific reasons were given then either. Regardless, I was hoping that some of the reasons for not doing AIA fetching must have been brought up in some discussion.

As for why Chrome for Android doesn't do AIA fetching up to this point, I don't have all the historical context, but from what I've seen, we've only recently realized how big an impact it has on users.
 

Independent of Android, we've talked about doing outreach to site owners: contact the top N sites that have misconfigured intermediates, do a notification in Webmaster Tools, something in DevTools, etc. I think those are things worth exploring in parallel.

These seem like useful things, though I would guess that they won't change the state of things too much on Android unless you simulate Android's certificate validation logic and root store(s) in the Chrome developer tools, which seems like a lot of work relative to the effectiveness I would expect it to have.

Yep, for something like Webmaster Tools, we could conceivably simulate Android's certificate verification, but for DevTools, not so much. There I was thinking that we might be able to do something super simple that covers some not all cases, like warn if we successfully built a chain but the server only served a leaf. I know Ryan doesn't think this kind of stuff belongs in DevTools though so you two might be able to agree to agree on that point at least. :)

One thing I haven't looked closely at is whether it's a small number of sites causing most of the problem. I half-expect that, among these ideas for reaching site owners, individual outreach to top sites might have the biggest impact (similar to your hypothesis about 100 intermediates).
 
 

In the document, you write "Using data from Chrome’s Safe Browsing Extended Reporting program, we estimate that server chains with incorrect or missing intermediates account for >10% of all certificate validation errors in Chrome, and >30% of all certificate validation errors that occur in Chrome for Android. About 90% of the errors caused by missing or misconfigured intermediates occur on Android."
 Could you explain how these numbers 10%, 30%, and 90% are calculated? Are the 10%/30%/90% numbers indicative of the number of times that users see the certificate error page, or are they indicative of the number of pageloads with such certificate errors, or are they indicative of the number of TLS connections with such certificate errors, or HTTP requests on TLS connections with such certificate errors, or something else?


Chrome sends a report every time an opted-in user sees the certificate error page, and the 10%/30%/90% is the percentage of those reports that we estimate are due to misconfigured intermediates.

Usually when we look at the compatibility impact of a change, we do it in terms of percentage of total pageloads or similar. For example, in the "Intent to Deprecate/Remote" emails, there is almost always a use count figure quoted that shows the compatibility impact is minimal. In order to help convert the numbers 10%/30% into comparable figures, could you share the total percentage of pageloads that result in any kind of certificate error page?

Unfortunately I can't share an exact number, but it's well under 1% of page loads. Another, somewhat more qualitative way to look at it is that certificate errors are consistently a top source of user complaints.
 
 
 
In particular, do you expect that this change will account for a 10%/30% reduction in certificate error pages seen by users? And, if this change isn't expected to improve things by the full 10%/30%, then what is expected improvement?

I'm not sure if the above explanation answers this or not? Conceptually, I'd expect a 30% reduction in certificate error pages seen by Android Chrome users.

Yes, you answered exactly the question I had. Thanks!

Cheers,
Brian
--

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.
To post to this group, send email to net...@chromium.org.

Brian Smith

unread,
Sep 28, 2016, 7:35:23 PM9/28/16
to Emily Stark, net-dev, Ryan Sleevi, Eric Roman
Emily Stark <est...@chromium.org> wrote:

Yep, for something like Webmaster Tools, we could conceivably simulate Android's certificate verification, but for DevTools, not so much. There I was thinking that we might be able to do something super simple that covers some not all cases, like warn if we successfully built a chain but the server only served a leaf. I know Ryan doesn't think this kind of stuff belongs in DevTools though so you two might be able to agree to agree on that point at least. :)

I think what makes sense to do with Chrome devtools depends on answering the question that hasn't been answered yet: With respect to intermediate certificates, what is the minimum that a server is required to provide?
 
One thing I haven't looked closely at is whether it's a small number of sites causing most of the problem. I half-expect that, among these ideas for reaching site owners, individual outreach to top sites might have the biggest impact (similar to your hypothesis about 100 intermediates).

Right. A small number of sites causing the issue would imply a small number of distinct missing intermediates. And, more generally, the number of distinct intermediates issued by publicly-trusted CAs is pretty small.

Usually when we look at the compatibility impact of a change, we do it in terms of percentage of total pageloads or similar. For example, in the "Intent to Deprecate/Remote" emails, there is almost always a use count figure quoted that shows the compatibility impact is minimal. In order to help convert the numbers 10%/30% into comparable figures, could you share the total percentage of pageloads that result in any kind of certificate error page?

Unfortunately I can't share an exact number, but it's well under 1% of page loads. Another, somewhat more qualitative way to look at it is that certificate errors are consistently a top source of user complaints.

Thanks for checking. Interestingly, I couldn't find the number from Firefox's telemetry either.

Unfortunately the material difference is really more precisely what fraction of 1% it is. For example, an impact of 0.32% was categorized as "fairly high" in one Blink intent-to-remove thread, where a negative impact of 0.01% was considered OK for removing DHE cipher suites. I think that it is reasonable to consider the compatibility impact of how missing intermediate certificates are dealt with using the same scale that was used for measuring the DHE change and other similar compatibility-related changes.

More to the point, as there's not going to be any solution that works perfectly, it's hard to evaluate any solution without knowing what the target goal is and how far away from the goal we are. For example, let's say the TLS connection failure rate due to missing intermediate certificates is 0.02%. Then an immediate solution that cut that rate in half, in conjunction with other efforts with longer latency, like the site outreach you mentioned, might be more than acceptable, even good. But if the TLS connection failure rate due to missing intermediate certificates is 0.9% then it is pretty likely that no one immediate change is going to have enough of an impact to declare success.

Brian Smith

unread,
Sep 28, 2016, 8:38:14 PM9/28/16
to Ryan Sleevi, Emily Stark, net-dev, Eric Roman
Ryan Sleevi <rsl...@chromium.org> wrote:
On Thu, Sep 22, 2016 at 6:22 PM, Brian Smith <br...@briansmith.org> wrote:
My suggestion is that people experiment with downloading the intermediates that are commonly missing from the same place that roots are already downloaded from and secured using the same mechanisms, whereas the AIA fetching mechanism proposes to download them from an arbitrary site an attacker or perhaps even a legit peer asks us to download them from.

While true, this doesn't materially change the threat model, as browser clients must already be capable of loading arbitrary certificates (the sites), and capable of following arbitrary resources (after all, this is Hypertext).

That is, to a browser client, an AIA fetch is conceptually quite similar to fetching JQuery from a CDN (secured with SRI).

I think that's oversimplifying things too much because it doesn't take into account the difference in what happens in the trusted parent process vs the less-trusted content processes. For example, JQuery from a CDN is being parsed and processed in a less-trusted content process whereas the processing of the AIA response, including in particular the parsing of the ASN.1 wrapper of the returned certificates, is done in the trusted parent process, IIUC. Further, a website can use CSP (including, in particular, block-all-mixed-content) and other mechanisms to limit what third parties are contacted when connecting to it, except for OCSP and AIA fetching, which are beyond its control.

Also, I encourage you to read the section "What's the story with certificate revocation" in the Chromium Security FAQ at https://www.chromium.org/Home/chromium-security/security-faq#TOC-What-s-the-story-with-certificate-revocation- where it is noted "Additionally, non-stapled OCSP poses a privacy problem: in order to check the status of a certificate, the client must query an OCSP responder for the status of the certificate, thus exposing a user's HTTPS browsing history to the responder (a third party)." AIA fetching has basically the same risks as OCSP (including also similar terrible performance and reliability impact).

Also, in general it seems there is a common goal to move to an HTTPS-only web, but AIA fetches are virtually always http://, so they can't be a long-term solution. Browsers using a mechanism that relies on http:// fetching places a compatibility burden on non-browser clients that would prevent them from disabling non-https:// fetching entirely.
 
As Emily clarified, we see both - things that are much more clearly misconfigurations (such as leaf-only certs), but we also see that the variety of root stores across Android revisions - which, unfortunately, don't autoupdate - need the ability to handle transitions. A PKI transition can be observed in both sites I posted - crt.sh (the transition away from AddTrust) and google.com (the transition away from Equifax). In both cases, new roots are stood up, which are supported on new clients, but older clients need older intermediates.

When sites are going through such transitions, they have to choose who the 'default' configuration works for. When a new version of Android is released, for example, the site likely can't rely on that version reaching 100% ubiquity within days, and thus needs to continue to supply the 'old' chain to the root. However, it equally can't wait for 100% ubiquity of the new version before it may decide to stop sending the older chain - even if that's what Google has, in effect, been doing.

In this instance, the problem is mostly due to the lack of an effective update mechanism in Android, and the lack of standardization of root stores in Android up to Android (if I understand http://android-developers.blogspot.com/2016/07/changes-to-trusted-certificate.html correctly, this latter part is fixed in Android 7). To be clear, I do think it is reasonable for a web browser running on Android to work around Android's limitations. The question is about what is the least harmful workaround that is effective.

If I understand you correctly, you are saying that some websites have to serve the old cert chain because of old Android and then hope that browsers running on newer Android will somehow be able to construct a different chain. But, in this case, in addition to the privacy concerns with the OCSP mechanism mentioned in the Chromium Security FAQ that also apply to AIA fetching, users running the newer Android have to sit around and wait for the AIA fetch to finish. From Adam Langley's blog post at https://www.imperialviolet.org/2012/02/05/crlsets.html we can see that "the median time for a successful OCSP check is ~300ms and the mean is nearly a second." Firefox's telemetry for Firefox 48 on Android (https://tinyurl.com/hfx42s9) reports the following (in milliseconds):

5th Percentile
58.05
25th Percentile
117.23
Median
244.6
75th Percentile
590.38
95th Percentile
2.62k


As you know, this is complicated all the more during the transition from, say, RSA-1024 bit to RSA-2048 bit. The lack of AIA fetching, for example, has required applications running on RHEL (as well as some versions of Android) to support RSA-1024 bit roots, to facilitate the transition, even when it's completely undesirable to support 1024-bit keys *and* paths of purely 2048-bits are possible.

The lack of implementing *any workaround whatsoever* resulted in that. However, as mentioned in the Mozilla bug for the Equifax issue (https://bugzilla.mozilla.org/show_bug.cgi?id=1155279#c1), preloading the cross-signing intermediate was also a possible workaround. 
 
While it sounds like you're positioning this as a Chrome move that pushes the ecosystem one way, it's worth noting this is true on most platforms and browsers. The exception, rather than the rule, is Chrome on Android and Firefox.

Yes, I am not saying that Chrome on Android doing AIA fetching is worse than Chrome doing it on Windows or other platforms, or other browsers doing it. I think it would be better to find a solution to get rid of the AIA mechanism on *all* platforms and *all* browsers that doesn't break the web. (FWIW, I also think Firefox's caching of every intermediate cert it comes across on the web is bad for the web.)

The reason I'm encouraging you to find a better solution here, specifically for Chrome for Android, is that Chrome for Android is the likely the deciding vote on the matter; if Chrome on Android does it then it seems very likely everybody, including many non-browser clients, will have to do it.
 
Firefox's stance is largely possible because they consider changes only relevant to the current version of Firefox - older root stores are not considered, asymmetric release schedules (such as OS vs root store) are not considered, root autoupdate (or the lack thereof) is not configured. While it's elegant in some ways, it unfortunately does not reflect where the broader ecosystem - of browsers and non-browser TLS clients - are.

This is the same kind of argument that was made to discredit Chrome when it decided to stop doing OCSP fetching, what made it hard to do mixed content blocking, and what makes progress hard in general. We do have to recognize the current reality but we we can change it.
 
 
FWIW, it appears from reading Firefox's bug database that they're still open to the idea of preloading the intermediate certificates before they decide to implement AIA fetching. (That is based on my reading of decisions made by people other than me, after I left, so I might be misunderstanding them.)

Indeed, and in a forward-thinking world in which all paths are permuted, and all versions of the root store is known (as in the case of Firefox), this is perhaps tenable. However, clients such as Chrome - which execute on a variety of platforms which may have disjoint stores - the notion of a 'one size fits all' solution is not consistent with how the WebPKI historically works or, I would argue, is desired to work. At a minimum, the notion of an 'intermediate bundle' is necessary per the set of trusted CAs (to be minimal), or contains unrelated or perhaps conflicting intermediates (if using a single union)

This seems like an argument that we couldn't do preloading of intermediates perfectly or with 100% coverage. But, AIA fetching isn't perfect either. It is excruciatingly slow and also networks aren't reliable, especially on mobile. Also, not every incomplete certificate chain contains the AIA fetch URL to even enable AIA. The presence of the AIA and OCSP URLs in the certificates also makes the certificates bigger which causes its own negative effects. And, AIA fetching is by far the most complex feature of certificate validation, especially for an implementation that doesn't do OCSP fetching. IMO the cost of making all this a mandatory part of the webpki isn't justified by the need to work around Android's problems.
 
 
I doubt this is true, but I don't follow Mozilla stuff as closely as you seem to. What specific cases do you have in mind?

Equifax

See above. The Equifax issue was due to the fact nobody did anything to work around it. AIA was one potential solution but it wasn't and isn't the only potential solution.

 
 
This is my main point: The more implementations that implement AIA fetching now, the harder it will be to drop it later in favor of better alternatives if/when they become viable.

I think we might disagree here, but ultimately, it depends on the objectives. I believe you're attaching a distinction to different types of configurations - one, such as only supplying the leaf cert, is 'bad' and the 'servers fault', and so displaying a warning is entirely appropriate to force it to send additional certs - while the other, the aforementioned root store disharmony, is 'good', and something the client should fix (perhaps with better alternatives).

I'm definitely not making that distinction.
 
I don't believe that in the system we have, we can reliably or meaningfully distinguish those two in such a way that it will result in a positive outcome, and it's better to overcorrect for the ecosystem and minimize any user interstitials, then it is to take the ideological approach and refuse to fix.

I agree 100%. And, in particular, nobody has ever suggested that we refuse to fix the problem.
 
I'm incredibly sympathetic to the ecosystem arguments, but as I mentioned in the other mail, I believe that some form of AIA fetching is a net-win for all clients, properly optimized (such as with proper caching). And once that feature is introduced, the ability to distinguish 'bad' from 'good' is necessarily lost, and I don't see that as a bad thing either, which may be our point of disagreement.

I don't make a distinction between "good" and "bad" like you imply and so I don't think that's the root of the disagreement. I think we agree on lots of things, such as "proper caching." The main disagreement, if there is any, seems to be about the mechanism in which the cache is populated. But, maybe there is less disagreement than I thought. Let me check:

1. Are you open to the possibility of pre-populating the cache from some browser update or root-store source, like CRLSets are already pre-populated?

2. Are you open to measuring the effects that such pre-populating the cache would have?

3. Are you open to the possibility that pre-populating and occasionally refreshing the cache, as is done already for CRLSets, may mitigate the raised problems enough that we could avoid the hazards, poor user experience, and complexity inherent in AIA fetching?

Ryan Sleevi

unread,
Sep 28, 2016, 9:08:49 PM9/28/16
to Brian Smith, Ryan Sleevi, Emily Stark, net-dev, Eric Roman
I appreciate your feedback, Brian, but I'm afraid I have to point out there are serious flaws in your arguments here, as appealing as they are at first glance.

On Wed, Sep 28, 2016 at 5:38 PM, Brian Smith <br...@briansmith.org> wrote:
I think that's oversimplifying things too much because it doesn't take into account the difference in what happens in the trusted parent process vs the less-trusted content processes.

This is not a meaningful argument. You are going to parse the server-sent certificates in order to determine if they're trusted. For your argument to be that AIA harms security, you would need to show that there's no trust at all in the server. Unfortunately, this is easily and emprically demonstrable as 'no additional risk' beyond what's already accepted.
 
and other mechanisms to limit what third parties are contacted when connecting to it, except for OCSP and AIA fetching, which are beyond its control.

The introduction of OCSP is clearly a misdirect; let's focus on what we're discussing, which is AIA. Your assertion previously was that sites can control what third-parties are contacted by supplying the 'correct' intermediates. I showed that the notion of 'correct' is flawed and improper. You now assert that the site cannot control AIA fetching. One, this is logically inconsistent with your previous argument; are you now swayed to believe that the notion of 'correct' is flawed, or are you still holding on to that argument? Two, there is clearly a choice in CA selection, and if we were to accept your previous supposition that there is a 'correct' way to do it (and perhaps simply CAs are not doing it yet), then clearly, there's choice. So I don't buy this argument at all - it's inconsistent with your previous arguments and inconsistent entirely.
 
AIA fetching has basically the same risks as OCSP (including also similar terrible performance and reliability impact).

This is a gross oversimplification, and I'm disappointed to see it made, because I know you are intimately familiar with what I'm about to point out: Which is that OCSP directly refers to the certificate you're using (that is, you ask the CA "I'd like to ask about the certificate for google.com"), whereas AIA is asking the CA for a certificate the CA provides - that is, "Please tell me about your CA". This does not reveal the site you're using in practice.

Now, I anticipate you might respond with a hypothetical concern, which is 'what if' CAs wanted to track. And while that might be interesting from an academic standpoint, it doesn't fit into your narative that they're "basically the same" (because they're clearly not), nor does it factor into CAs' interests and alignment, and it's trivially detectable and controllable (with respect to the WebPKI) using alternative means.

So this is not at all an argument.
 
Also, in general it seems there is a common goal to move to an HTTPS-only web, but AIA fetches are virtually always http://, so they can't be a long-term solution. Browsers using a mechanism that relies on http:// fetching places a compatibility burden on non-browser clients that would prevent them from disabling non-https:// fetching entirely.

This is to suggest the move to eliminate HTTP is on ideological purity grounds, which is not the case. This is not saying "All HTTP for all protocols is bad". It's acknowledging the risks and balances, and in the case of AIA fetches, the browser has strong confidence in the authenticity and integrity of the message (vis-a-vis the signature on the AIA-signed certificate), has strong confidence that the privacy is preserved (vis-a-vis CT and WebPKI disclosures), and has strong assurances that the risks of HTTP for browsing *do not apply*.


 
 (if I understand http://android-developers.blogspot.com/2016/07/changes-to-trusted-certificate.html correctly, this latter part is fixed in Android 7).

No.
 
The question is about what is the least harmful workaround that is effective.

Your argument implicitly suggests this causes harm, but you have yet to establish any harm, and the lack of AIA fetching is causing demonstrable harm.

But, in this case, in addition to the privacy concerns with the OCSP mechanism mentioned in the Chromium Security FAQ that also apply to AIA fetching,

Again, this is not accurate.
 
users running the newer Android have to sit around and wait for the AIA fetch to finish.

Yes. They do. 
 
From Adam Langley's blog post at https://www.imperialviolet.org/2012/02/05/crlsets.html we can see that "the median time for a successful OCSP check is ~300ms and the mean is nearly a second."

This is comparing chickens and eggs. As you know, OCSP responders are unfortunately deployed quite often in 'live' scenarios (e.g. signing on the fly). AIA does not have that.

As you know, OCSP responses are not germane to caching (which is why Microsoft pushed for things like "High Performance OCSP" profiles, which our other platforms - notably NSS - did not implement at the time we measured), and even under ideal conditions are limited to 7 days. AIA caches are measured in years.
 
Firefox's telemetry for Firefox 48 on Android (https://tinyurl.com/hfx42s9) reports the following (in milliseconds):

5th Percentile
58.05
25th Percentile
117.23
Median
244.6
75th Percentile
590.38
95th Percentile
2.62k

This is comparing apples to oranges.

I think it would be better to find a solution to get rid of the AIA mechanism on *all* platforms and *all* browsers that doesn't break the web. (FWIW, I also think Firefox's caching of every intermediate cert it comes across on the web is bad for the web.)

OK. There, we disagree, and I don't think we'll make much more germane progress here, considering you and I've been discussing this for years.

As discussed earlier on this thread, AIA plays an important role in the ecosystem transition. Eliminating AIA is not a priority at this time, nor do I agree it should be. We have bigger fish to fry to improve the ecosystem and user's experience, and AIA is, in effect, an auto-update mechanism for the Web PKI.

Your objections and concerns are hopefully addressed above, thoughtfully and well-considered.
 
1. Are you open to the possibility of pre-populating the cache from some browser update or root-store source, like CRLSets are already pre-populated?

Explicitly: No.

There are far more important, user-facing, security-relevant things to be investing resources in.
 
2. Are you open to measuring the effects that such pre-populating the cache would have?

This suggests I accept as valid your arguments for the benefits of a cache. I explicitly reject that, and thus naturally reject the conclusion that there is value in this.

This is especially relevant given the performance measurements we've been making, in general, with respect to Android's certificate infrastructure. Such a solution is prima facie unacceptably slow, and to get the proposal to a viable path first requires significant other investments. Those are the investments worth making, but this is not.
 

3. Are you open to the possibility that pre-populating and occasionally refreshing the cache, as is done already for CRLSets, may mitigate the raised problems enough that we could avoid the hazards, poor user experience, and complexity inherent in AIA fetching?

No, because I disagree with you on the hazards, I disagree with you on the poor user experience, and I disagree with you with the notion of it being inherently complex. 

Ryan Sleevi

unread,
Sep 28, 2016, 9:43:17 PM9/28/16
to Ryan Sleevi, Brian Smith, Emily Stark, net-dev, Eric Roman
On Wed, Sep 28, 2016 at 6:08 PM, Ryan Sleevi <rsl...@chromium.org> wrote:
I appreciate your feedback, Brian, but I'm afraid I have to point out there are serious flaws in your arguments here, as appealing as they are at first glance.

One aspect that I think is more relevant to this thread: We're no longer discussing Emily's proposal, but whether AIA fetching is fundamentally the right thing.

Even if AIA fetching were not, long-term, the right thing to be doing, on the basis of cross-platform consistency as we work out that story, we should do this. And we shouldn't tie it up with a discussion of the future of AIA in an ideal Web PKI that has methods yet to be invented or tested, and which only work for the Web PKI (e.g. to ignore entirely the use and utility of this in other PKI scenarios which Chromium/Chrome does and will continue to support, such as enterprise PKIs).

As I mentioned, AIA plays an important role, however, and while I wouldn't want to discourage you from writing up a concrete proposal for a future of a world without AIA, hopefully my responses have provided a greater degree of clarity with respect to the priorities (e.g. reducing user errors, on a result-/user-oriented approach, but also things like focusing on improving the Certificate Transparency story, on a PKI-gardening-oriented approach, or eliminating SHA-1, on a security-oriented approach) and the concerns.

Brian Smith

unread,
Sep 30, 2016, 12:52:57 AM9/30/16
to Ryan Sleevi, Emily Stark, net-dev, Eric Roman
Ryan Sleevi <rsl...@chromium.org> wrote:
On Wed, Sep 28, 2016 at 5:38 PM, Brian Smith <br...@briansmith.org> wrote:
I think that's oversimplifying things too much because it doesn't take into account the difference in what happens in the trusted parent process vs the less-trusted content processes.

This is not a meaningful argument. You are going to parse the server-sent certificates in order to determine if they're trusted. For your argument to be that AIA harms security, you would need to show that there's no trust at all in the server. Unfortunately, this is easily and emprically demonstrable as 'no additional risk' beyond what's already accepted.

Here's an example of the additional risk I am talking about: What appears to be use-after-free in the certificate fetcher component in Chrome in the parent process:


I read the code and noticed that you don't implement support for the CMS format of response. That is good, though that is also a potential compatibility issue that might limit the effectiveness of AIA for solving this problem. If so, you may have to add support for the CMS format of response, which again would further increase the attack surface as you're adding a new data format parser.

Again, remember my point was simply the obvious one that when you increase complexity you increase risk. In my own projects we have no use for a thing like your CertNetFetcher because we don't do OCSP or CRL fetching, and so adding such a thing is a considerable increase in complexity and risk. Thus, even if you find taking on such additional risks to be acceptable for Chrome, I hope you can understand why other projects might disagree, and thus why we'd consider this proposed change to Chrome for Android to be making the Web PKI more dangerous to implement in general, even if you yourself disagree.
 
 
and other mechanisms to limit what third parties are contacted when connecting to it, except for OCSP and AIA fetching, which are beyond its control.

The introduction of OCSP is clearly a misdirect;

 
let's focus on what we're discussing, which is AIA. Your assertion previously was that sites can control what third-parties are contacted by supplying the 'correct' intermediates.

I'm not sure what assertion of mine you're interpreting as saying this, but it is clearly not true, and so obviously so that I don't think I'd ever make the mistake of claiming it to be. See below.
 
I showed that the notion of 'correct' is flawed and improper.

Actually, you didn't make a convincing argument regarding this.
 
You now assert that the site cannot control AIA fetching.

I agree I am saying that a site cannot control AIA fetching. This is obvious in the case of an active MitM, where the active MitM could simply remove any/all intermediate certificates from the TLS handshake.

One, this is logically inconsistent with your previous argument; are you now swayed to believe that the notion of 'correct' is flawed, or are you still holding on to that argument?

I think you may be just misunderstanding something I previously wrote. At least, there appears to be some miscommunication.
 
Two, there is clearly a choice in CA selection, and if we were to accept your previous supposition that there is a 'correct' way to do it (and perhaps simply CAs are not doing it yet), then clearly, there's choice. So I don't buy this argument at all - it's inconsistent with your previous arguments and inconsistent entirely.

I have to admit that I don't fully understand what you're getting at in the above sentence. I think we already agreed that, at least as far as current standards and practices are concerned, there could be cases where a site may have done everything that it could reasonably expect to do, and still not send a certificate chain that the browser would consider to be valid, unless the browser implements some workaround, be it AIA fetching or preloading intermediate certificates or caching previously-seen certificates or something else. So, I feel like perhaps we've already agreed to more than you've realized, or I'm not understanding you.
 
 
AIA fetching has basically the same risks as OCSP (including also similar terrible performance and reliability impact).

This is a gross oversimplification, and I'm disappointed to see it made, because I know you are intimately familiar with what I'm about to point out: Which is that OCSP directly refers to the certificate you're using (that is, you ask the CA "I'd like to ask about the certificate for google.com"), whereas AIA is asking the CA for a certificate the CA provides - that is, "Please tell me about your CA". This does not reveal the site you're using in practice.
 
Now, I anticipate you might respond with a hypothetical concern, which is 'what if' CAs wanted to track.  
 
I think it is fair to say that that OCSP has more privacy risks than AIA fetching, except maybe in degenerate cases of AIA fetching like what you mentioned. I also think it isn't completely ridiculous to draw a line between the privacy risks such that OCSP crosses it but AIA fetching doesn't. However, again, others may draw the line somewhere else, such as I do. I don't I deserve the insinuation that I'm being dishonest because of that.

Also, in general it seems there is a common goal to move to an HTTPS-only web, but AIA fetches are virtually always http://, so they can't be a long-term solution. Browsers using a mechanism that relies on http:// fetching places a compatibility burden on non-browser clients that would prevent them from disabling non-https:// fetching entirely.

This is to suggest the move to eliminate HTTP is on ideological purity grounds, which is not the case.

I agree, because I wasn't suggesting anything about ideology. I am thinking more about how a web where clients have to implement AIA fetching to be compatible would prohibit HTTPS-only clients from implementing additional security measures that take advantage of their HTTPS-only nature. I am working on such things now and this work has nothing to do with ideology.
 
This is not saying "All HTTP for all protocols is bad". It's acknowledging the risks and balances, and in the case of AIA fetches, the browser has strong confidence in the authenticity and integrity of the message (vis-a-vis the signature on the AIA-signed certificate), has strong confidence that the privacy is preserved (vis-a-vis CT and WebPKI disclosures), and has strong assurances that the risks of HTTP for browsing *do not apply*.

Again, I think you should realize that others, such as myself, have more ambitious goals.
 
users running the newer Android have to sit around and wait for the AIA fetch to finish.

Yes. They do. 

Great. I'm glad we can agree on at least one thing.
 
 
From Adam Langley's blog post at https://www.imperialviolet.org/2012/02/05/crlsets.html we can see that "the median time for a successful OCSP check is ~300ms and the mean is nearly a second."

This is comparing chickens and eggs. As you know, OCSP responders are unfortunately deployed quite often in 'live' scenarios (e.g. signing on the fly). AIA does not have that.

As you know, OCSP responses are not germane to caching (which is why Microsoft pushed for things like "High Performance OCSP" profiles, which our other platforms - notably NSS - did not implement at the time we measured), and even under ideal conditions are limited to 7 days. AIA caches are measured in years.
 
Firefox's telemetry for Firefox 48 on Android (https://tinyurl.com/hfx42s9) reports the following (in milliseconds):

5th Percentile
58.05
25th Percentile
117.23
Median
244.6
75th Percentile
590.38
95th Percentile
2.62k

This is comparing apples to oranges.

Please post more accurate numbers data. The OCSP numbers seem like a reasonable estimate of what we can expect for AIA fetching performance, but like any reasonable software engineer I'm happy to be proven wrong with better data.
  
1. Are you open to the possibility of pre-populating the cache from some browser update or root-store source, like CRLSets are already pre-populated?

Explicitly: No.

There are far more important, user-facing, security-relevant things to be investing resources in.
 
2. Are you open to measuring the effects that such pre-populating the cache would have?

This suggests I accept as valid your arguments for the benefits of a cache. I explicitly reject that, and thus naturally reject the conclusion that there is value in this.

Just FYI, you were the one that suggested "proper caching" in the first place.
 
3. Are you open to the possibility that pre-populating and occasionally refreshing the cache, as is done already for CRLSets, may mitigate the raised problems enough that we could avoid the hazards, poor user experience, and complexity inherent in AIA fetching?

No, because I disagree with you on the hazards, I disagree with you on the poor user experience, and I disagree with you with the notion of it being inherently complex. 

OK. If you're not willing to work together on this, then I'll find others to work with to solve this problem.
Reply all
Reply to author
Forward
0 new messages