Hello,
I wanted to run an idea past the Mozilla community that’s sort of half-baked, but maybe the community can help flesh it out. This is mostly a personal idea of mine at this point, but if there is enough interest / support it might become a DigiCert initiative.
The idea is motivated by the idea that it’s absolutely impossible to determine with any confidence whether a particular certificate can be replaced within a 24 hour or a five day period (unlike many of the participants whose experience here is hypothetical, I’ve done my time in the trenches). Many customers can safely replace, with various degrees of expense and effort, but many customers can’t replace safely, and the incentives all point in the wrong direction, so getting reliable information to make determinations is a real challenge, and very time consuming. And as much as I love crypto agility, not having anyone get hurt or die is also pretty high on my priority list. And there are some cases where that’s exactly what’s on the table.
As I stated in Bergamo, instead of trying to apply band-aids to this very broken process, I think we need to entirely rethink how we do things in order to make any progress. So I’m going to start throwing out some ideas that might work, and maybe we’ll eventually converge on a solution that has a chance of working. Because the current system doesn’t.
This is just the first of a bunch of proposals, so here we go:
If a publicly-trusted certificate is difficult to replace, for various regulatory or technical reasons, the real reasons do not magically appear when rotation is necessary. But a host of fake reasons are likely to arise (“we can’t rotate certificates faster because it costs money we don’t want to spend”). Furthermore, making progress on this problem would be greatly assisted by better information about exactly which certificates can’t be replaced, the timescale on which they CAN be replaced, and why.
The world would be better if we all knew, IN ADVANCE, which certificates are automatically replaceable, and which aren’t. This would also greatly streamline operations when replacements are necessary, as it removes the burden on making the determinations with a ticking clock, which is a situation that doesn’t lend itself to careful and unbiased evaluations.
This would make things better in a number of ways:
The information could even be contained in a certificate extension, so that the rotation practices of these organizations is transparent. That would then make it possible to track the effectiveness of initiatives to reduce the barriers to rotation of WebPKI certificates. There’s even a chance we could actually use the information to make revocation and rotation better, instead of just arguing about it on internet forums!
One potential downside is that this would make critical certificates stick out like a sore thumb, but I think on balance, the transparency is more valuable than the disclosure risk. I’ve never been a huge security by obscurity fan, anyway.
I realize this would be a major change to how we do things, but we’ve been having this exact same conversation about certificate replacement for pretty much the entire decade I’ve been involved at CABForum, and I think it’s time for radical change. If this isn’t the right idea, it at least gives a sense of the kind of change that is needed to make progress here, and I would love to hear any other potential ideas for how we finally exit the traffic circle and start moving forward again.
-Tim
Dear Tim and Matt,
Thank you both for your insightful comments and contributions to the ongoing discussion regarding timely certificate revocation. Your perspectives are invaluable as we strive to find balanced and effective solutions to this problem.
Tim, your proposal to identify problematic certificates in advance and make this information transparent not only addresses the core issue of preparedness, but also encourages organizations to improve their crypto agility.
Matt, your questions and alternative proposal for regular, randomized revocation testing are equally thought-provoking. Regular testing would ensure that processes are robust, and that organizations remain vigilant about their revocation capabilities.
Given the complexity and importance of this issue, I would like to keep the discussion alive and invite additional comments from the Mozilla community.
Personally, I currently favor extending the timeframe for the revocation of certificates that have no security impact, e.g. to 20 days (exact language TBD – e.g. by adding a new subsection to section 4.9.1.1 of the Baseline Requirements). I understand that extending the timeframe from 5 days to 20 days for some types of revocations might raise questions about the empirical basis for my position, especially concerning our continued preparation for 24-hour revocations when security compromises like we experienced with Heartbleed happen, but here are some points to consider. My review of past Bugzilla incidents shows that many delayed revocations are not related to security issues, but to compliance details that do not pose immediate security risks. We have also received consistent feedback from CAs and subscribers that the 5-day window for these types of revocations is too restrictive and does not reflect the operational realities of many organizations. The current 5-day timeframe does not account for holidays, weekends, and other operational delays. Extending the timeframe provides a more realistic window for organizations to respond without compromising their operational integrity. Some organizations face legal and regulatory hurdles that make immediate revocation challenging, and extending the timeframe can help them comply with both CA/B Forum requirements and local laws. When adopting any security-related measure, such as revocation, a cost-benefit-based risk analysis should be done. The analysis should justify why a 5-day period is necessary when a 20-day period might be just as effective without imposing undue burdens. Finally, extending the timeframe for non-security-related revocations does not hinder preparation for 24-hour revocation timelines for critical security incidents. In fact, it allows organizations to better allocate resources and develop robust processes that can be quickly mobilized in the event of a security compromise.
But whatever decision we reach as consensus is good for me--our collective goal should be to find solutions that work best for the entire community, and it would be great if we could come up with some solutions and then recommend them to the Server Certificate Working Group of the CA/Browser Forum. To facilitate this, I propose that we continue to gather more input from the community, and try to understand the different perspectives, which will help us refine suggestions and identify potential challenges and solutions. Everyone’s continued engagement and support are crucial as we work towards a consensus. I encourage everyone in the community to share their thoughts and suggestions to help us develop a robust and effective strategy to improve security while reducing the number of CA incidents that are due to delayed revocation.
Thank you once again for your contributions, and I look forward to our continued collaboration on these important issues.
Best regards,
BenHi Tim,
Personally, I currently favor extending the timeframe for the revocation of certificates that have no security impact,
Thanks, everyone, for keeping this conversation going. It's essential that we continue because I believe the current framework is unworkable.
Ben
What are the issues you see from the perspective of a root program with the current framework?
Mike and Amir,
Here are some of the goals that come to my mind from the perspective of the Mozilla Root Program, followed by my short response concerning what to do with the current framework.
Currently, the 5-day revocation period is not working effectively, as evidenced by ongoing issues documented in Bugzilla. As I said before, I’d like to reach a consensus determination on what is best for the ecosystem. While I understand the argument for stricter revocation timelines, I believe there are broader considerations based on how this valuable TLS technology is currently being used to support healthcare, airlines, banking, etc.
Contemporaneously with this discussion here, I plan to turn my attention to GitHub Issue #276 and start addressing the issue with better guidance in the wiki about reporting expectations and with new language (TBD) to be added to the Mozilla Root Store Policy. I also plan to be more proactive in commenting on CA compliance reports.
In summary, Mozilla's goals align closely with those of other root programs--maintaining control over CAs and minimizing their non-compliance while ensuring secure and effective CA operations.
Thanks, and keep the conversation going so that we can come to some consensus.
Ben
Dear Ben,
Thanks for your effort to re-ignite the discussion.
Personally, not speaking as a representative of my employer, I suggest two things to balance the interests:
Finding good and workable solutions is often a give-and-take from all involved parties. Maybe by taking away the leeway for exceptions but giving the 3rd deadline is something that leaves all sides both happy and unhappy enough to accept it? 😉
Rgds
Roman
From: 'Ben Wilson' via dev-secur...@mozilla.org <dev-secur...@mozilla.org>
Sent: Mittwoch, 24. Juli 2024 23:06
To: dev-secur...@mozilla.org
Cc: Mike Shaver <mike....@gmail.com>; Matt Palmer <mpa...@hezmatt.org>; Tim Hollebeek <tim.ho...@digicert.com>; Amir Omidi (aaomidi) <am...@aaomidi.com>; Wayne <rdau...@gmail.com>; Jeremy Rowley <jeremy...@digicert.com>
Subject: Re: Feasibility of a binding commitment to revoke before issuance
Thanks, everyone, for keeping this conversation going. It's essential that we continue because I believe the current framework is unworkable.
Ben
--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dev-security-po...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CA%2B1gtaYACVE_sN_OdvczL_MKTX-sVE8PyFEhxfCoPRxi7CG04g%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/e96e43b7-cc95-4318-9a2b-7366a4319a6cn%40mozilla.org.
The world would be better if we all knew, IN ADVANCE, which certificates are automatically replaceable, and which aren’t. This would also greatly streamline operations when replacements are necessary, as it removes the burden on making the determinations with a ticking clock, which is a situation that doesn’t lend itself to careful and unbiased evaluations.
The information could even be contained in a certificate extension, so that the rotation practices of these organizations is transparent.
One potential downside is that this would make critical certificates stick out like a sore thumb, but I think on balance, the transparency is more valuable than the disclosure risk. I’ve never been a huge security by obscurity fan, anyway.
Thanks, Amir,
I appreciate your challenge to the assumption that the 5-day rule is the primary reason why we are where we are. We recognize that this is a difficult area, and many CAs have fallen short. Let's explore whether to create a meta-incident, but the more important part is that we come up with a meta-solution. As part of the effort, we should continue looking at the root causes to gain a more comprehensive understanding of the issues at play.
Here are my thoughts on the observations you’ve provided:
That this problem is not affecting every CA equally suggests that there are other variables at play. We ought to look at the culture of compliance, the nature of their customer bases, and the CAs' relationships with their customers, among other variables, to understand these differences, which will then help us to identify better solutions.
We agree that OV and EV certificates are more affected, in part, because the primary causes of misissuance are mistakes in the additional fields that such types of certificates they contain, and perhaps the customers who use them.
We agree that CAs should not perceive that they can bypass the rules without consequence. We need to ensure that there are clear and consistent consequences for non-compliance, and the rules to achieve and maintain compliance need to be more clear.
I agree that the vast majority of the recent delayed revocation incidents would have still been delayed revocation incidents even if the period was extended to 20 days. However, I am hoping that a 20-day timeframe, along with an effort to phase out most of the excuses (by requiring quicker and more specific disclosures from CAs and their subscribers about reasons for delay), will reduce the scale of this issue. This effort will need collaboration by CAs and root stores. We should explore treating a failure to pre-disclose the required information, publicly, as a key focus of delayed revocation Bugzilla filings.
Finally, Mozilla also believes that automation, both in issuance and in replacement and revocation, is a path forward, but we need to move more in that direction first in the short term before it can become a long-term solution.
Also, regarding your final question, this discussion does not pause the BR requirement, but in dealing with CAs we shouldn't disregard the complexities of the issues presented.
Thanks again,
Ben
Based on what we observe in recent and current delrev incidents, it defies belief that any strategy involving categorizing certificates into those that warrant immunity from the revocation rules will be accurate, fair, or in the best interest of the WebPKI. Over the past four months we have watched more than a dozen public CAs choose to delay revocation of unambiguously misissued certificates for vast periods of time ranging into the span of months. Many of these incidents involved delayed revocation for the majority of the affected certificates. The CAs offer the flimsiest of excuses, or make no attempt at excuses at all. They drag out the same, tired comments about lengthy approval processes and prohibitive regulations. They use holidays, weekends, vacations, and end of quarter as excuses. They say these systems are critical until it turns out that would be against the CPS in effect, at which point conveniently these systems are NOT critical anymore.
Any parent quickly learns to detect when you’re being handed a line, and Bugzilla is being handed lines left and right. Most of these CAs don’t even display the creativity to make up their own bad fabrications and instead simply crib bad fabrications from those who have come before. The poster child here is the obligatory misrepresentation of Mozilla’s delayed revocation policy. This policy emphasizes that CAs are expected to follow the BR revocation deadlines every time, but CA after CA conveniently omits that part of the policy as they wave their hands around why this is not blatant disregard for the rules, as if the rest of us somehow lack the ability to look up the original policy to read and understand what it says.
It’s a sad state for the set of companies that supposedly are the guardians of public identity and the security of the WebPKI.
Regretfully, I for one have come to the conclusion that we cannot rely on Subscribers and their CAs to fairly categorize certificates into those qualifying for some kind of extended revocation timeline and those that do not. If we are to take reporting CAs at their word, then we know Subscribers have a propensity to actively lie to CAs or omit the facts if they see it being to their advantage in gaining a revocation delay. Or the more disturbing thought is that the CAs themselves are omitting, or lying, or coaching their Subscribers on how to omit and lie so that they can reliably delay revocation of certificates.
The fact is that Subscribers actually are able to replace these certificates on time. They simply don’t want to. It’s a hassle. It takes time away from other projects. It messes with their evenings and weekends. Sometimes it costs extra budget. Hell, if they can delay it long enough, a nontrivial number of certs will expire on their own and won’t require revocation at all. So, when given the opportunity to represent their processes and systems as incapable of agile certificate replacement, Subscribers sing that tune. When given the opportunity to explain the results of forced on-time revocation as disastrous, they sing that tune also.
CAs, likewise, are motivated the wrong way. They can be sticklers and make their paying customers angry at them, or they can be lenient and become heroes in the customers’ eyes. This can be a powerful temptation, and we have seen CAs succumb over and over again.
Perhaps if we could create some kind of objective, perfect, fair, consistent, and externally measurable criteria for certificate use cases and circumstances warranting a revocation delay, then those rules could be enacted for all CAs to follow equally. But I don’t see any credible candidates for these criteria, have never heard a legitimate proposal for such a thing, and do not believe it is possible.
Making CA opinion the basis for judging which certificates deserve a deadline extension is also unworkable. We have the abovementioned problems with Subscriber and CA credibility. Additionally, there is simply no way a CA has the visibility and detailed operational knowledge needed to genuinely evaluate a Subscriber’s ability to swap out certificates in a given timeframe and the consequences of failure to do so.
There is, however, an organization that is intimately familiar with the Subscriber’s processes and abilities and the consequences of missing certificate replacement on time. This organization is capable of making risk/reward tradeoffs for certificate agility versus other initiatives and can enact real-time resource and process adjustments to deal with unforeseen revocations. That organization is, of course, the Subscriber whose certs are up for revocation.
Subscribers who learn their certificates will be disappearing at a specific time roughly 100 hours from now always seem to have the new certificates installed before the old ones die. Always. We know this at Sectigo because for the past few years we have not entertained delayed revocation requests by any Subscriber in any environment for any reason. We simply let them know when their certificates will stop working and focus on helping them obtain and install replacements. And I personally believe, unless it becomes codified as an exception policy in the relevant regulations, that Sectigo will never entertain the idea of purposefully delaying revocation again.
I firmly believe this to be the only viable path forward. We need to abolish the deliberate delay of mandated revocations.
Removal of deliberate delrevs serves the WebPKI in many ways.
- It maintains a clean and compliant certificate base for Relying Parties to trust.
- It increases motivation for CAs to strive for error-free operations.
- It encourages automation and certificate agility among Subscribers, who know they won’t be able to talk their way out of a revocation event.
- It is consistent, fair, simple to understand, and easily measured.
- It teaches CAs and anyone else watching the WebPKI that the rules matter and must be followed.
- It eliminates counterproductive motivators influencing CAs today.
- There is a clear path to success that every CA has the technical and procedural ability to execute.
Of course, a reading of the Baseline Requirements and the major root store program guidelines will reveal this requirement today. However, we are missing meaningful, reliable consequences for failure to comply. In each of these cases the CA believed the negatives of transparent disobedience to the BRs to be less than the negatives of completing the revocation. Otherwise they would not have delayed.
And with few exceptions they were probably right. Most of the CAs with willful delrev incidents from the past four months, or the past four years, will not face distrust as a direct result. And in an ecosystem with no other penalty for noncompliance, this means most CAs are pragmatically motivated to appease their paying customers – or their bosses in the larger organizations that own them – even at the expense of the WebPKI. Right now, the penalty is that you have to write up a Bugzilla incident and answer uncomfortable questions from a few nosy jerks for a couple of months until everyone gives up in frustration and lets you close the bug. In many cases, the CA judges this to be considerably less painful than angering one or more of the Subscribers that keep the CA operational.
We need root programs to enforce these rules with enough power to tip the decision-making scales. We need CAs to dread the consequences of delayed revocation more than they dread Subscriber displeasure. That has to mean either 1) that the likelihood of root distrust goes up dramatically among one or more major root programs or 2) that the WebPKI comes up with some kind of alternative consequence. This consequence would have to be painful enough to seriously demotivate intentional delay but not so severe that browsers are unwilling to use it.
All,
In addition to the ideas stated in my previous email, I’d like to get your thoughts on some of the steps we can take while we discuss the current 5-day revocation requirement in BR 4.9.1.1.
We can, and should, develop more stringent disclosure requirements that compel CAs to provide advance descriptions of the circumstances under which they cannot revoke a certificate within the required timeframe. My proposal is that we modify Mozilla's guidance on delayed revocation. We will preserve that statement that “Mozilla does not grant exceptions to the BR revocation requirements.” The revisions would mandate full disclosure in advance so that the community can evaluate the CA's and subscriber's arguments for delayed revocation. Also, there have been too many instances where CAs have failed to include all the necessary information in preliminary delayed revocation incident reports. Moving forward, and for existing delayed revocation bugs, CAs would need to closely follow the updated instructions, which would require specificity when claiming exceptional circumstances, significant harm, critical infrastructure, and all would be on a per-certificate basis rather than on a per-subscriber basis. Moreover, CAs would be required to attest that they have communicated with, or will shortly communicate with, their auditors, supervisory bodies (if applicable), and all Root Stores that they participate in to indicate that they have begun a process to analyze the risk and formulate a remediation plan to address delayed revocation. This comprehensive approach would ensure that we are not just relying on CA opinion but are creating a structured, transparent process that the entire community can trust and verify.
Thoughts?
Ben
Thanks, Wayne.
We understand that it might appear we are perpetuating a misunderstanding that CAs and subscribers can claim exceptions. However, our intention is to phase out this approach over time while we gather more information to inform our decisions. We believe this effort will lead to stricter requirements and the eventual elimination of delayed revocations.
Thanks again,
Ben
--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/d3f849ae-a20d-4d24-be77-aa5ff4316293n%40mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CA%2B1gtaYwx7OO0Q%3Dn06FpEEszHAfYHUkcHN47Z1C1kJ%2B-zxJ0gA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CA%2B1gtaanR7xUiNdV5RZiqO0U0SJ3_g6xn8d77W8rR8yV%3DgzsWg%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/6acb5841-5f9e-4fc7-998c-6be6525ecb81%40mtasv.net.