Logs & Compliance (was Re: [ct-policy] Pilot and Aviator RFC compliance)

240 views
Skip to first unread message

Ryan Sleevi

unread,
Jun 6, 2016, 8:31:26 PM6/6/16
to ct-p...@chromium.org
First off, apologies for the delays in responding to this. This is not a normal response time. Between CA/B Forum and being out of the office w/o email access, and given the (lack of) severity on this, it was decided to wait until I was back before we responded to this.

In the replies following this incident, there was a concerning trend of thought, and unfortunately, I fear I'm primarily to blame for it. Below are three sample replies from the two threads related to this:

On Thu, Jun 2, 2016 at 11:07 PM, Iñigo Barreira <inigo.b...@gmail.com> wrote:
Peter, if this is a violation of the policy, Google has clearly indicated the decision in the past. There´s no such "low impact" interpretation so as you say, Google will have to take an action on what to do in this matter.
 
On Sat, Jun 4, 2016 at 10:23 AM, Richard Salz <rich...@gmail.com> wrote:

So now we are waiting to here the Chromium response, right?

It seems a minor noncompliance, but I am not fully convinced users wouldn't be at risk. But it is noncompliance, so I expect to see the standard actions taken.

On Sat, Jun 4, 2016 at 10:51 PM, Matt Palmer <mpa...@hezmatt.org> wrote:
Given that these logs have failed to abide by a MUST criteria in RFC6962, does
Chromium intend to remove these logs from the trusted set?  If not, why not?

These three replies all fit into the same category - that is, assuming there's a standard response, which is zero tolerance for any mistakes. However, that's not what the Certificate Transparency Log Policy says. In particular, see the section on Policy Violations (and the sentence before it within the Ongoing Requirements).

It's not true that every violation has resulted in a log's removal. For example, consider

These are all examples of failure to abide by the last requirement.

So why weren't these logs removed, and others removed? What's the logic behind it? Well, this is where I fear I'm to blame, because while I've tried to explain comprehensively the impact of a log removal, I haven't take a similar amount of time to explain the nature of the log removal, such that it seems people have come to assume a log removal is the default or only response.

I had tried with the respective Log Operators to provide more details about the rationale, certainly with Inigo more than anyone due to the technical nature of it, but I haven't done as well of explaining this publicly.

Our first and foremost goal with Certificate Transparency is to better protect Chrome users. That's a bit self-serving, admittedly, but if the extent is that we made Chrome a more secure browser, that's still a positive win. This isn't unique to dealing with the PKI - Mozilla, Apple, Microsoft, and others all have policies that go above and beyond the CA/Browser Forum's Baseline Requirements to protect and meet their users' needs, and this is no exception. Like Mozilla, we want to operate in the open about these, and believe Chrome is in a unique position to improve global online security, and work towards that, but at the end of the day, users - people - come first.

But that second goal is still present - improve the Web PKI and ecosystem. We want Certificate Transparency to be a thing others adopt, and it's clear that it's already showing significant value - detecting concerning practices at organizations such as Symantec, or helping Facebook detect certificates it didn't intend to be issued. These benefits have only been realized because of the nature of Chromium's Certificate Transparency Policy.

That second goal is nuanced, because it requires considering not just Chrome's needs, but also the needs of monitors, of other client implementations, of log operators, and of CAs. Our policies are designed to try to bootstrap the system that extends beyond just Chrome's specific needs, while also being mindful of what our users' security needs are.

To that end, the policies we developed - for Logs and CAs - were designed to address a variety of concerns - ranging from security to operational quality to ecosystem health. For example, the requirement that there must be N SCTs touches on operational concerns - we don't want to encourage a world where logs are "too big to fail" because distrusting a log (such as if it's compromised) could mean wide swaths of certificates are made untrusted. The requirement that at least one of the SCTs be from a Google log, however, touches on the security needs of our users: because Chrome is not yet checking SCT inclusion proofs, we don't have sufficient safeguards against logs providing malicious split views. However, because Chrome implicitly and explicitly trusts Google, and Google won't operate a split view any more than it would inject deliberately malicious code at targeted users, we are able to address that security concern. Similarly, the log policy of notifying Google of changes to the root certs is designed to help inform and shape the ecosystem discussion about how logs behave, whether they operate in the public interest / interest of Chrome's users / interest of monitors.

The uptime requirement is both a matter of ensuring a healthy ecosystem and ensuring the ecosystem is reliable - for monitors, auditors, and Chrome users. A log that is unavailable is indistinguishable from a log that has failed to incorporate an SCT into the STH, or worse, could hide a split view from monitors. That is, imagine a log that returned 500 every time you queried an SCT that was from the 'hostile' view, or which dropped the connection. There's real and practical security concern here, and for that, we so far have taken a very strict approach on those matters, as evidenced with Certly.

Similarly, providing a split view of the log - which is, effectively, what Izenpe did - is an operational failure that even if accidental, is indistinguishable from malice. Worse, however, it makes it difficult to implicitly trust the SCTs from the log (which is what all clients were doing, and which do for all logs today), since they're not doing the inclusion proof fetching. And if that inclusion proof fetch were to fail, the reasons for that failure would be unclear if the log was NOW compromised, or if it was just another 'operational' hiccup. Again, as a security concern, this is very concerning.

However, like I explained, there are other matters where logs have been non-compliant, but in ways that could be resolved without incident, or which spoke to more operational failures. For example, Symantec had encoding issues with SCTs briefly, due to zero padding. This resulted in invalid signatures on the SCTs embedded in the pre-certificate, BUT Symantec was able to correct this on the server to return the correct signatures. The end effect was minimal - clients rejected the SCTs, as expected. Similarly, the aforementioned failures to update about changes in the root store have not been met with removal.

With that historical context in mind, and the motivational goals here, my inclination is to think this was a minor failure, and not one requiring action. I have a hard time thinking of the security impact of this, in the sense that it does not seem meaningfully different than a log deciding to accept arbitrary roots that it published the private key too; that is, the signature validation requirement, though present in RFC6962, strikes me as more of a policy requirement (to mitigate DDOS) than as an objectively technical or security requirement.

There is the broader question about what it might mean for monitors who see certs with invalid signatures, or, using the above case, see certificates from roots that are not trusted. That's more of an ecosystem issue, particularly for monitors. For example, while Google decided to set up the Submariner log as a distinct log for untrusted roots, that's not a present requirement in the Chromium Certificate Transparency Policy, and other logs might accept such roots. It would seem then, strictly speaking, monitors already need to be carrying out their own validation of signatures, and application of local policy (e.g. such as "is this trusted by browsers/platforms that I, the site operator, care about"), and if so, this would have no impact.

Unfortunately, I don't have good answers for that, and I think that's a discussion that still needs to continue, in line with that second goal I mentioned - doing good for the ecosystem by making sure our policies reflect not just the needs of Chrome but the overall needs, until such a time as there's wider implementation and broader discussion.

I'm curious if there's perspective that hasn't been considered, or security risks that should be highlighted. The operational risks I've been able to think of - things like spam in logs, the size of logs, unchecked growth - are fortunately or unfortunately things not presently addressed in the policy, so it doesn't seem right to make this a sticking point.

Hopefully this (rather long) email provides sufficient context for the rationale - rationale I've been very poor at explaining thus far, as evidenced by the length of this email. I also don't want this to be seen as suggesting Google exceptionalism - that is, that Google is somehow exempt from the policies. This was certainly a failure to abide by RFC 6962, much like Symantec's previous failure, and I'm not dismissing that. I know that the team responsible for Google's logs seeks to operate their logs to the same standard that all logs are held to, and we collectively believe the Chromium CT Policy offers a good general model for other clients' policies. As I mentioned, however, not all aspects of the Policy are designed around security - some are proxies to ensure the ecosystem matures appropriately, some are proxies for aspects of the ecosystem not yet finalized (SCT inclusion proof fetching via clients, gossip via clients) - and how we handle such failures is balanced by the security needs of Chrome and the desire to ensure a healthy, robust ecosystem.

Hopefully this provides greater clarity into the thinking, and can be seen as consistent with other decisions, such as the decision not to support name redaction, for now, even in the presence of a finalized RFC 6962-bis.
Reply all
Reply to author
Forward
0 new messages