Accountability and best practices for CPS and code and/or process alignment

291 views

Skip to first unread message

Ryan Hurst

unread,

Sep 17, 2021, 11:12:06 AM9/17/21

to dev-secur...@mozilla.org

Hi MDSP community,

There have been a number of past issues where actual practices followed by CAs deviated from published practices. We recently had a delay in publishing our CPS which resulted in our CPS not removing language that accommodated a practice we were no longer using as a result of code changes we had deployed to prevent the practice.

After reviewing past incidents, current requirements, and considering improvements we could make, it led us to question whether there is an opportunity to improve the timeliness and accountability for CPS publications across the ecosystem to accurately reflect actual practices.

There are a number of common reasons that a CPS might drift, for example:

Canary deployments (https://martinfowler.com/bliki/CanaryRelease.html), where a change is rolled out to a small subset of users as an initial test before making it available to everybody.
Rapid deployment of a change containing a needed security fix or security enhancement.
Enhancing a process to be more restrictive than the CPS.
CPS updates may be necessary before a new type of certificate is issued to get the trust store approval, for example to get permission to issue code signing certificates.
Delays in obtaining reviews from external stakeholders such as the policy authority, executives, or legal team.

There are also a number of anti-patterns that might contribute to unnecessary delays in updates, for example:

Update processes where changes to the CPS are only made following a fixed schedule.
Bundling of many updates together into a large update, slowing review and publication.
Changes in practices not being reflected in the CPS due to process gaps.

Challenges prioritizing CPS reviews and edits against other high-priority items such as those in an action plan relating to an incident.

As we look at these cases it seems there may be cases where drifts in code behavior and the CPS, while not ideal, may be unavoidable, which is made more complicated for the community since such drifts are difficult for the community to detect.

This leads us to think that there may be value in having a clear lower bound in which drift is acceptable added to the BRs where auditors would be expected to assess if that requirement was being met as part of the audits.

Additionally we thought it would be good to have a conversation to gather the community’s thoughts around under what circumstances this does happen in your environments and what you do to manage that drift. Any thoughts would be greatly appreciated.

Ryan Hurst

Google Trust Services

Tim Callan

unread,

Oct 11, 2021, 4:39:47 PM10/11/21

to dev-secur...@mozilla.org, Ryan Hurst

Apologies, Ryan, for the delayed reply.

It’s valuable to note first that a CPS frequently states what the Certificate Authority will do under appropriate circumstances, rather than exclusively stating what it must do under all circumstances. This is easily demonstrable by the fact that multiple acceptable methods exist for certain required tasks, such as DCV. A CA may list multiple possible DCV methods in its CPS and make them all available for domain validation without taking on the obligation that all listed methods are employed in the case of every single domain.

This distinction is important. Several of the examples you give as reasons for drift do not, we believe, present any kind of CPS alignment problem. The CA should make updates to its CPS in advance of rolling out a new practice, for instance, so that when the practice is rolled out it will be in alignment with the published CPS. So long as the CA is stating that it MAY, rather than WILL ALWAYS, follow the new practice, this is perfectly fine. (And so long, naturally, as the practice is otherwise compliant with current requirements.)

Likewise, if the CA determines to add incremental controls beyond what the CPS states (and presumably beyond the minima necessary for root program and BR compliance), that will not represent a problem. We at Sectigo, for example, have a policy whereby all new OV/EV SSL certificates contain one discrete country name/state name combination that comes from our list of 6000+ possible combinations. Issuance of a certificate that contained something other than one of those combinations would be out of alignment with our policy, and we would deem such an occurrence to be a technical failure requiring investigation and resolution, as it should not have happened. It would not necessarily, however, represent CPS misalignment, as our CPS does not list those possible combinations.

The one example in the list that jumps out as important and in need of addressing is this idea of rapid responses to previously unknown circumstances. Walk with us through a brief thought experiment:

Let’s say that a CA comes to the realization that it failed to make a mandated update to a specific issuance practice affecting a subset of its certificates, call them Group A Certificates. The CA might make an all-hands-on-deck effort to write, test, and deploy a code fix to update its practice for Group A. However, if the updated (and otherwise compliant) procedural details are not in alignment with the previously published CPS, then Group A certificates will continue to count as misissued certs, even though all procedures follow specified in CABF rules. The problem will persist until the new CPS is published.

This is despite the fact that the severity of the errors is not the same. Many CPS errors are essentially clerical errors. We have seen this recently with various off-by-one-second errors CAs are experiencing as compared to their practice statements. We appreciate the value of a CPS as a source of truth for details of how the CA operates. We appreciate the importance that such details be accurate. We also contend that there is a meaningful difference between issuing a 90-day leaf certificate that is over the limit stated in the CPS by one second and issuing an SSL certificate that exceeds 398 days by one second. While 90 days plus one second can create CPS misalignment, there is no sensible argument that it actually brings security or trust risk beyond the general expectations for public SSL certificates.

One possible response is simply to say that a CA must be able to update its CPS in an agile fashion and that surely it is quicker work to change a document and publish it than it is to make the other changes necessary to account for a compliance issue. As a policy that has the advantage of being clean and unambiguous. Its disadvantage is subtler but worth uncovering.

Every requirement or regulation becomes another thing that a CA must implement and monitor. Each is like a tax that requires just a little bit of the CA’s work and attention. No requirement on its own seems like that big of a deal, but the more requirements we place, the more risk there is of performance failure. Death by a thousand cuts. The straw that broke the camel’s back.

Indulge us while we use an analogy from popular culture. At one point in the movie Pulp Fiction boxer Butch Coolidge is hiding out from LA mobsters in a motel room. He goes to get his heirloom gold watch from the suitcase full of possessions he has asked SO Fabienne to pack and bring. It is not there because Fabienne missed it among the long list of items she was to get. Before going back to recover the watch, Butch expresses that it was a mistake to make a long list when doing so put the one possession he really cared about, the watch, in jeopardy.

The same possibility exists with CA rules. It’s easy to add another rule and simply say, “Well, if you don’t have the chops to get this done, maybe you’re not cut out for the life of a public CA.” And once again, that is one possible position. The weakness of that position is it ignores the relative importance of our various rules. If CAs become swamped with rules that do not meaningfully improve security or trustworthiness of certificates, does that task list increase the risk of failure among the set of rules we truly care about? Are we being the best possible stewards of public online trust if we choose matters of trivial importance over the vital ones?

At a macro level we should we aware of this tradeoff whenever considering the rules we give ourselves as a community. It is the need for prioritization.

So bringing these deep thoughts back to the specifics of this thread, the question to ask is, what is the value of demanding full CPS/practices synchronization as opposed to allowing reasonable, short gaps to facilitate rapid response? It may be that allowing such gaps leads to more effective adjustments as the CA can remove the distraction of CPS review and publication until the immediate need is handled. In principle that could be worth it.

Of course, any ballot to make this change would have to codify exactly what time gaps were allowable under what circumstances. It’s easy to imagine that any permitted gaps would be short, a matter of a few days perhaps, and for good reason. What those reasons are and for how long we tolerate CPS/practice disconnect would require some work to figure out.

Tim Callan

Sectigo

Reply all

Reply to author

Forward

0 new messages