Feasibility of a binding commitment to revoke before issuance

852 views
Skip to first unread message

Tim Hollebeek

unread,
Jul 15, 2024, 5:22:30 PM (12 days ago) Jul 15
to dev-secur...@mozilla.org

Hello,

 

I wanted to run an idea past the Mozilla community that’s sort of half-baked, but maybe the community can help flesh it out.  This is mostly a personal idea of mine at this point, but if there is enough interest / support it might become a DigiCert initiative.

 

The idea is motivated by the idea that it’s absolutely impossible to determine with any confidence whether a particular certificate can be replaced within a 24 hour or a five day period (unlike many of the participants whose experience here is hypothetical, I’ve done my time in the trenches).  Many customers can safely replace, with various degrees of expense and effort, but many customers can’t replace safely, and the incentives all point in the wrong direction, so getting reliable information to make determinations is a real challenge, and very time consuming.  And as much as I love crypto agility, not having anyone get hurt or die is also pretty high on my priority list.  And there are some cases where that’s exactly what’s on the table.

 

As I stated in Bergamo, instead of trying to apply band-aids to this very broken process, I think we need to entirely rethink how we do things in order to make any progress.  So I’m going to start throwing out some ideas that might work, and maybe we’ll eventually converge on a solution that has a chance of working.  Because the current system doesn’t.

 

This is just the first of a bunch of proposals, so here we go:

 

If a publicly-trusted certificate is difficult to replace, for various regulatory or technical reasons, the real reasons do not magically appear when rotation is necessary.  But a host of fake reasons are likely to arise (“we can’t rotate certificates faster because it costs money we don’t want to spend”).  Furthermore, making progress on this problem would be greatly assisted by better information about exactly which certificates can’t be replaced, the timescale on which they CAN be replaced, and why. 

 

The world would be better if we all knew, IN ADVANCE, which certificates are automatically replaceable, and which aren’t.  This would also greatly streamline operations when replacements are necessary, as it removes the burden on making the determinations with a ticking clock, which is a situation that doesn’t lend itself to careful and unbiased evaluations.

 

This would make things better in a number of ways:

 

  1. For organizations that do use automation, or are able to deal with the agility requirements, the certificates just get revoked on the agreed upon timelines.  No discussions, no exceptions.  This would be the standard way certificates work going forward.
  2. For organizations that can’t, we would have transparent information about the existence of the issue, the nature of the problem, and the severity (feasible timeline).  This would aid in efforts to evaluate an implement methods to reduce the scope and severity of the WebPKI agility issues.  Currently, we never find out about these issues until replacement is needed, and at that time it’s too late to do anything.
  3. Once the information is available, it will be much easier to have reasonable discussions about the nature and distribution of problems, what the root causes are, and how various regulations and industry practices can be changed to encourage and enhance crypto agility, instead of holding it back.
  4. And furthermore, it would be better if these disclosures came directly from the subscribers, because they are the only ones in a position to know the ground truth.  Subscribers could be held publicly accountable for the accuracy and reasonableness of their determination and need.
  5. Hopefully, the need to declare their inability to rotate certificates would “encourage” organizations to improve their crypto agility until they no longer needed to appear on the list of people holding back WebPKI agility.  It would also allow requirements to be written about what practices are acceptable and what aren’t, and compliance with timelines to move away from questionable practices could be monitored easily.

 

The information could even be contained in a certificate extension, so that the rotation practices of these organizations is transparent.  That would then make it possible to track the effectiveness of initiatives to reduce the barriers to rotation of WebPKI certificates.  There’s even a chance we could actually use the information to make revocation and rotation better, instead of just arguing about it on internet forums!

 

One potential downside is that this would make critical certificates stick out like a sore thumb, but I think on balance, the transparency is more valuable than the disclosure risk.  I’ve never been a huge security by obscurity fan, anyway.

 

I realize this would be a major change to how we do things, but we’ve been having this exact same conversation about certificate replacement for pretty much the entire decade I’ve been involved at CABForum, and I think it’s time for radical change.  If this isn’t the right idea, it at least gives a sense of the kind of change that is needed to make progress here, and I would love to hear any other potential ideas for how we finally exit the traffic circle and start moving forward again.

 

-Tim

 

Matt Palmer

unread,
Jul 15, 2024, 10:09:59 PM (11 days ago) Jul 15
to dev-secur...@mozilla.org
Hi Tim,

On Mon, Jul 15, 2024 at 09:22:22PM +0000, 'Tim Hollebeek' via dev-secur...@mozilla.org wrote:
> If a publicly-trusted certificate is difficult to replace, for various
> regulatory or technical reasons, the real reasons do not magically appear
> when rotation is necessary. But a host of fake reasons are likely to arise
> ("we can't rotate certificates faster because it costs money we don't want
> to spend"). Furthermore, making progress on this problem would be greatly
> assisted by better information about exactly which certificates can't be
> replaced, the timescale on which they CAN be replaced, and why.
>
> The world would be better if we all knew, IN ADVANCE, which certificates are
> automatically replaceable, and which aren't. This would also greatly
> streamline operations when replacements are necessary, as it removes the
> burden on making the determinations with a ticking clock, which is a
> situation that doesn't lend itself to careful and unbiased evaluations.

If I'm understanding your proposal correctly, it basically requires
organisations to identify, in advance, certificates which cannot be
replaced in line with the WebPKI requirements.

If so, while I agree with the motivations (to have more useful
information), I have... questions:

1. What is the motivation for an organisation to take the time and
effort to identify all problematic certificates? These organisations
apparently don't have the available resources to fix the current
problems, what will their reaction be to being asked to do even more
work?

2. If an organisation does not proactively declare a problematic
certificate as being problematic, what are the consequences at
revocation time? I can't imagine that CAs will be willing to revoke
those certificates even though the organisation has not declared them as
problematic, for the same reasons that those CAs are not willing to
currently revoke problematic certificates.

3. If an organisation is capable of proactively identifying problematic
certificates, why issue a WebPKI certificate at all? On its face, a
declaration that a certificate is incapable of being rotated in line
with the requirements of the WebPKI is an admission that the customer is
(or at the very least expects to be) in breach of their subscriber
agreement.

4. For certificates that are problematic, why add an extension to a
WebPKI certificate that says "this certificate is non-compliant", rather
than just moving that usage to a private PKI.

5. Do you have any reason to believe that CAs and their customers will
even be *willing* to disclose this sort of information? In every
previous incident that comes to mind, the prevailing attitude from CAs
has been to refuse to disclose customer information in any meaningful
fashion. I can understand their reticence there on one level, as a
protection against "customer poaching"[1], and I'd be hesitant for Mozilla
to make it a requirement for CAs to disclose this from an anti-trust
action perspective.

> I realize this would be a major change to how we do things, but we've been
> having this exact same conversation about certificate replacement for pretty
> much the entire decade I've been involved at CABForum, and I think it's time
> for radical change. If this isn't the right idea, it at least gives a sense
> of the kind of change that is needed to make progress here, and I would love
> to hear any other potential ideas for how we finally exit the traffic circle
> and start moving forward again.

My proposal is that root programs require CAs to accept revocation
reqests from the root programs themselves for randomly-chosen
certificates. At random intervals, a root program sends a (suitably
authenticated) email to the CA's problem reporting address stating "this
certificate should be considered compromised as of this moment, revoke
in line with the BRs". Frequency and volume could be tuned to issuance
volume, with upper and lower bounds as needed to ensure universal
coverage without unduly burdening any particular CA with excessive
administrivia.

I base this proposal on two factors:

1. Regular testing of processes is important to be confident that those
processes work. When I was running the Pwnedkeys Revokinator, I found
plenty of problems with revocation practices at several CAs, resulting
in multiple problem reports. I'd be more than willing to resurrect the
Revokinator to once again analyse revocation processing compliance if I
had confidence in support for it by root programs.

2. It would put *everyone* in the ecosystem on notice that revocation is
something that needs to be planned for. At the moment, organisations
can deploy their infrastructure on the basis that "it'll never happen to
us, we don't lose our keys / suffer from bugs / whatever", and they
don't consider other causes of revocation. While the probability of any
particular certificate getting chosen would be very low, that *definite*
non-zero probability is likely to get more attention than any number of
out-of-the-ordinary incidents that organisations can dismiss with "well,
*that* would never happen to us!"

- Matt

Ben Wilson

unread,
Jul 24, 2024, 2:36:31 PM (3 days ago) Jul 24
to dev-secur...@mozilla.org, Matt Palmer, Tim Hollebeek

Dear Tim and Matt,

Thank you both for your insightful comments and contributions to the ongoing discussion regarding timely certificate revocation. Your perspectives are invaluable as we strive to find balanced and effective solutions to this problem.

Tim, your proposal to identify problematic certificates in advance and make this information transparent not only addresses the core issue of preparedness, but also encourages organizations to improve their crypto agility.

Matt, your questions and alternative proposal for regular, randomized revocation testing are equally thought-provoking. Regular testing would ensure that processes are robust, and that organizations remain vigilant about their revocation capabilities.

Given the complexity and importance of this issue, I would like to keep the discussion alive and invite additional comments from the Mozilla community. 

Personally, I currently favor extending the timeframe for the revocation of certificates that have no security impact, e.g. to 20 days (exact language TBD – e.g. by adding a new subsection to section 4.9.1.1 of the Baseline Requirements). I understand that extending the timeframe from 5 days to 20 days for some types of revocations might raise questions about the empirical basis for my position, especially concerning our continued preparation for 24-hour revocations when security compromises like we experienced with Heartbleed happen, but here are some points to consider. My review of past Bugzilla incidents shows that many delayed revocations are not related to security issues, but to compliance details that do not pose immediate security risks. We have also received consistent feedback from CAs and subscribers that the 5-day window for these types of revocations is too restrictive and does not reflect the operational realities of many organizations. The current 5-day timeframe does not account for holidays, weekends, and other operational delays. Extending the timeframe provides a more realistic window for organizations to respond without compromising their operational integrity. Some organizations face legal and regulatory hurdles that make immediate revocation challenging, and extending the timeframe can help them comply with both CA/B Forum requirements and local laws. When adopting any security-related measure, such as revocation, a cost-benefit-based risk analysis should be done. The analysis should justify why a 5-day period is necessary when a 20-day period might be just as effective without imposing undue burdens. Finally, extending the timeframe for non-security-related revocations does not hinder preparation for 24-hour revocation timelines for critical security incidents. In fact, it allows organizations to better allocate resources and develop robust processes that can be quickly mobilized in the event of a security compromise.

But whatever decision we reach as consensus is good for me--our collective goal should be to find solutions that work best for the entire community, and it would be great if we could come up with some solutions and then recommend them to the Server Certificate Working Group of the CA/Browser Forum. To facilitate this, I propose that we continue to gather more input from the community, and try to understand the different perspectives, which will help us refine suggestions and identify potential challenges and solutions. Everyone’s continued engagement and support are crucial as we work towards a consensus. I encourage everyone in the community to share their thoughts and suggestions to help us develop a robust and effective strategy to improve security while reducing the number of CA incidents that are due to delayed revocation.

Thank you once again for your contributions, and I look forward to our continued collaboration on these important issues.

Best regards,

Ben
On Monday, July 15, 2024 at 8:09:59 PM UTC-6 Matt Palmer wrote:
Hi Tim,

Amir Omidi (aaomidi)

unread,
Jul 24, 2024, 3:45:37 PM (3 days ago) Jul 24
to dev-secur...@mozilla.org, Ben Wilson, Matt Palmer, Tim Hollebeek
Hey Ben,

I think that the suggestion to increase the time frame for revocation from 5 to 20 days is dangerous. Here are a couple of issues I have with this:

First: Security Impact Analysis is very difficult. It's arguably harder than root cause analysis. The majority of CAs (by count, not issuance) do an awful job at root cause analysis. I do not think they are (or, honestly speaking, will ever be) at the maturity level to do security impact analysis within 24 hours to determine if this is a 24-hour or 20-day revocation deadline.

Second: We're effectively going to be left with very few situations that necessitate 24-hour revocations. This proposal:
  1. Makes it harder to test out if mass revocations will actually work when they're required.
  2. Discourages entities from adopting Certificate Lifecycle Management (CLM).
  3. Makes it significantly more difficult to reduce certificate lifetimes to a 90-day maximum in the future.
  4. Sacrifices Web PKI security because of a handful of enterprise companies that have the money, and talent to solve this problem internally, but are choosing to invest in ${literally_anything_else} instead.
Third: Holidays, weekends, etc. are not really relevant here either, because any of these incidents can become a 24-hour revocation incident anyway, and if the 24-hour revocation incidents are not happening often enough, then CAs will not be ready to execute on a revocation like that. If this is too prohibitive for a CA to staff itself so it can handle revocation within 24 hours, they should consider not being a CA.

Fourth: Root Program enforcement of the existing policies is weak. Mozilla & Apple & Microsoft still have not distrusted Entrust despite the clear negligence in their operations. So what happens if a CA doesn't revoke in 20 days? Or misses a 24-hour revocation requirement? Any sort of rule change here without significantly upping the enforcement is not okay imo.

Fifth: We already have a way for CAs and Subscribers to avoid the need for revocation: Short lived certificates.

Sixth: The distribution of who benefits and who is hurt by this change is interesting. For example, on the CA ans subscriber side:
  1. Top CAs (in terms of issuance load), are either fully automated, or have automation integrated with part of their product. Some of these CAs also provide CLM solutions to avoid outages due to CA issues. So they're not really going to benefit from this.
  2. Majority of subscribers (in terms of numbers of certificates held) have, or are planning to implement CLM into their products. So they don't really get any benefit from this proposal either.
The folks that really benefit from this change are:
  • Boutique CAs that have barely adopted automation for their CA issuances. (e.g. some small CAs, some government CAs, etc)
  • A handful of enterprise subscribers that are not investing into CLM and are relying on manual work for certificate replacement.
The folks that hurt, quite a bit, from this change are the end users (many of which look up to Mozilla to protect them when many other RPs are not). This change would make the web less safe for everyone by giving more allowances for the bad CAs and Subscribers to continue their bad behavior.

Anyway, this change encourages more hands-on and non-automated certificate lifecycle management. This would be a regression in the ecosystem.

Alternative Proposal

This is going to be pretty controversial too: I'd be in favor of removing the 5-day category altogether, and require a 24-hour revocation for all mis-issuances (probably as a step function, lowering the 120 hour time limit by 24 hours every 6 months or something until they're aligned?)

My justification for this is the inverse of the stuff I mentioned above. In other words, it forces companies to adopt automation, removes ambiguity from the side of CAs, and generally propels the ecosystem forward. This also means that we get more assurances that when a Crowdstrike situation hits Web PKI, we actually can respond in a reasonable time frame. This proposal also significantly simplifies the communications CAs must have with their subscribers about why a certificate is being revoked.

Amir

Jeremy Rowley

unread,
Jul 24, 2024, 3:49:41 PM (3 days ago) Jul 24
to Matt Palmer, dev-secur...@mozilla.org
Hi Matt - to continue the conversation (and note that I am not Tim and that
these are my own thoughts, not necessarily the position of Digicert):

> 1. What is the motivation for an organisation to take the time and effort
to identify all problematic certificates? These organisations apparently
don't have the available resources to fix the current problems, what will
their reaction be to being asked to do even more work?

I think you'd find organizations willing to admit this information upfront
if it was the only way to delay revocation past the required timeframe. This
also is a question that can be asked to capture which companies are using
automation and expand on reasons some companies are not using automation.
One note is that I disagree that resources are a major issue. They always
are, but I believe the real issue are policies that prevent quick
replacement and less efficient practices. Even people with ACME sometimes
have weird approval workflows before the automation can do its job. This
plan also doesn't account for when random stuff that goes wrong. For
example, you have a major breakage during cert replacement. I don't see how
people can account for that upfront. Will it prevent all delays? Likely no,
however this idea gets user information upfront to the community instead of
trying to provide that while operating in crisis mode.

> 2. If an organisation does not proactively declare a problematic
certificate as being problematic, what are the consequences at revocation
time? I can't imagine that CAs will be willing to revoke those certificates
even though the organisation has not declared them as problematic, for the
same reasons that those CAs are not willing to currently revoke problematic
certificates.

I think it helps emphasis that prompt revocation is required. Ie - "You
said you could revoke in 5 days an now can't - what changed?" Although the
revocation language already appears in agreements as boiler-plate, it
doesn't hurt to call it out. The other reason I like Tim's idea is because
its pretty easy to implement and see what we get back. Do subscribers care
so much about delayed revocation that they are willing to state they can't
do it? I think you'd want the Mozilla policy to be that delayed revocations
are not accepted unless this is declared up front.

> 3. If an organisation is capable of proactively identifying problematic
certificates, why issue a WebPKI certificate at all? On its face, a
declaration that a certificate is incapable of being rotated in line with
the requirements of the WebPKI is an admission that the customer is (or at
the very least expects to be) in breach of their subscriber agreement.

The providers are part of the WebPKI as it includes online banking or
healthcare, which are accessed via browser. The fact a subscriber would be
in breach of their agreement with the CA is interesting. That would need
some workshopping.


> 4. For certificates that are problematic, why add an extension to a WebPKI
certificate that says "this certificate is non-compliant", rather than just
moving that usage to a private PKI.
A lot of these aren't private-facing. They use a browser for access to the
site or service.

> 5. Do you have any reason to believe that CAs and their customers will
evenbe *willing* to disclose this sort of information? In every previous
incident that comes to mind, the prevailing attitude from CAs has been to
refuse to disclose customer information in any meaningful fashion. I can
understand their reticence there on one level, as a protection against
"customer poaching"[1], and I'd be hesitant for Mozilla to make it a
requirement for CAs to disclose this from an anti-trust action perspective.

I think disclosure is cost of not being able to revoke in 5 days. I
definitely agree with you that disclosing this information would be hard to
make happen, but the cost to set up the experiment is pretty low.

Anyway - I wanted to respond so the thread didn't get lost as I liked your
comments and Tim's proposal.
--
You received this message because you are subscribed to the Google Groups
"dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to dev-security-po...@mozilla.org.
To view this discussion on the web visit
https://url.avanan.click/v2/___https://groups.google.com/a/mozilla.org/d/msg
id/dev-security-policy/7b7baffa-7fed-449d-ad0c-543abafed999*40mtasv.net___.Y
XAzOmRpZ2ljZXJ0OmE6bzpmY2RlMDNiOTIwZWFiMDQ5NTRlYTM5OTc2YTJmMzJlZjo2OjMzN2M6Y
mVlMWRiZjg2ZWRmYWQyNTg1ZDA2OTQ4YzU4ODIzN2NhZGE5OTJlMTE1NTQzODc0ZGQ2MDkyYWIwY
jg1MjFkYjpwOlQ6Tg.

Wayne

unread,
Jul 24, 2024, 4:18:40 PM (3 days ago) Jul 24
to dev-secur...@mozilla.org
Hi Ben,

I'm going to have to agree with what Amir has outlined but I won't waste words repeating every one of his points. What I will add is the following:

There is a lack of historical underpinning on what caused the 5-day revocation window to appear in the first place: it already is a compromise to the CAs. There should be heavy suspicion on a CA advocating heavily this position when a few months ago they were trying to toe the line with maybe 15-days, and now it's pushed to 20-days. We're in a self-regulated field that needs strong enforcement, not kowtowing to the lowest standards and least capable subscribers to encourage perverse incentives to reduce certificate agility and regulatory compliance.

The mention of operational realities is a clear case that the CAs and subscribers are not intending to operate within the baseline requirements as agreed by all parties. Instead their priority is to tell you whatever will convince you to lower the standards to their level. We need to push for better standards across the board especially in encouraging a shift to short-lived certificates, and at least 90-day certificates in the near-future. Any attempt to reduce enforcement mechanisms and standards at this time is not going to do anything but slow down any further advancement in those areas.

If Mozilla are currently trying to convince themselves that reducing oversight will let these CAs magically allocate resources and personnel appropriately during a crisis, I don't know what to say.

As far as any concrete data fitting what these proposals are, you won't find it. I already produced a very generous breakdown on 20 incidents across 17 CAs 3 months ago. The lack of any input from these CAs purporting to have difficulties shows this is not about creating policy from data. I was very careful in producing that report to ensure I wasn't cutting the deadline off at 15 days, and went to the effort of including what things looked like after 30 days too. They still failed spectacularly, and we are not making good progress on that part to date. Most CAs in those incidents didn't even use the correct method of timing for the start of their 24h/5d clocks, so it's far more generous than it seems.

I would advise that Mozilla are advocating a very dangerous and short-sighted policy currently and need to speak with other Root Programs before they start advocating lower standards for everyone. Given the information I've seen from the other programs, I strongly doubt this is an acceptable policy and only exists to burn Mozilla's credibility. If you have CAs upset at you for using your enforcement mechanisms, that means your regulatory program is operating correctly, silence is the worst thing you can hear.

- Wayne

Mike Shaver

unread,
Jul 24, 2024, 4:53:05 PM (3 days ago) Jul 24
to Ben Wilson, dev-secur...@mozilla.org, Matt Palmer, Tim Hollebeek
On Wed, Jul 24, 2024 at 2:36 PM 'Ben Wilson' via dev-secur...@mozilla.org <dev-secur...@mozilla.org> wrote:

Personally, I currently favor extending the timeframe for the revocation of certificates that have no security impact,

I propose, tongue only slightly in cheek, that if a component of the certificate doesn't have security impact, it be removed from the certificate specification in the BRs. TLS certificates are precious space, and reducing their size by eliminating things that have no use in security contexts

 Are there any specific examples of what deviations of certificates would be deemed so minor that they can stay live on the web for 20 days, but still worthy of revocation? With the number of CAs out there, and the rate of certificate issuance and error related there-to, it would be a virtual guarantee that there are a number of flavours of "slightly wrong" certificates active on the web. That means that everything needs to handle such certificates existing, in order to operate as part of the web PKI, so we should just capture in the standard that this alternative shape is OK and let everyone issue in that expanded envelope all the time. Presumably the web would benefit from this in some way, if the CABF would entertain such changes, but I confess that I can't tell what that benefit is.

Similarly, I might ask: how much of a grace period should Firefox give for accepting a certificate after it has expired? I mean, what's 20 days? It expired naturally, after all...

More seriously, I don't think that we are generally in a position to be certain that no system exists which depends on a certain property of a certificate. Is there something out there that is gating access or acting differently on the basis of a case-sensitive country code match? If there is, the designers certainly weren't wrong to build it that way, IMO. The BRs are a commitment to the world that web PKI certificates will behave a certain way, and laxity on making sure that certificates actually do conform will mean that effectively the BRs are no longer really true or useful for their purpose.

In addition to whatever subjective assessment a CA might make (hardly as a disinterested party, I hasten to add) about the security implications of a given certificate's deviation from, there is also the concern of interoperability. A new entrant to the web (such as a browser like Ladybird, or a CA like the next Let's Encrypt, or some future CDN Fastflare) will need to not only implement to meet and handle the *specified* certificate forms and behaviours, but also somehow know about all the kinds of variants that are likely to be floating around at any given time. Mozilla and Firefox know first-hand and extensively what a barrier it can be when the standard says that things should be one way, but other parties produce or expect something different.

Finally, conformance to the standards and correct issuance is just not that hard, as regards the things that have been argued to be "too minor to revoke in 5 days". They would virtually all have been caught by decent linting. I don't see how it helps the web to make these cases easier for CAs to handle. It seems only that it would benefit CAs who routinely misissue sloppy certificates in "minor" ways. If they can't get these little things right, how can we trust their key material management or background checks or entropy sources? It's not like we're seeing the raw audit reports, even though they are really for the benefit of the root programs.

Maybe the job of being a CA is too hard for some organizations that are doing it now. That's OK. The Web doesn't need all of the CAs we have today as much as it needs CAs that help move the integrity of web PKI *forward*, rather than weakening it a little bit at a time for their convenience when they have failed to meet their commitments.

Mike

Ben Wilson

unread,
Jul 24, 2024, 5:05:51 PM (3 days ago) Jul 24
to dev-secur...@mozilla.org, Mike Shaver, Matt Palmer, Tim Hollebeek, Amir Omidi (aaomidi), Wayne, Jeremy Rowley

Thanks, everyone, for keeping this conversation going. It's essential that we continue because I believe the current framework is unworkable.

Ben

Amir Omidi

unread,
Jul 24, 2024, 5:06:46 PM (3 days ago) Jul 24
to Ben Wilson, Jeremy Rowley, Matt Palmer, Mike Shaver, Tim Hollebeek, Wayne, dev-secur...@mozilla.org
What are the issues you see from the perspective of a root program with the current framework?

Mike Shaver

unread,
Jul 24, 2024, 5:10:26 PM (3 days ago) Jul 24
to Amir Omidi, Ben Wilson, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, dev-secur...@mozilla.org
On Wed, Jul 24, 2024 at 5:06 PM Amir Omidi <am...@aaomidi.com> wrote:
What are the issues you see from the perspective of a root program with the current framework?

Yes, it would be good to understand what the goals of the framework are, how the current rules work against those goals, and how different approaches (another deadline extension, a “bad cert, pls ignore” attribute, random audit through revocation, etc.) would better reach them.

Without that it is hard to really figure out what might be helpful, since we may well have different goals in mind!

Mike

Ben Wilson

unread,
Jul 24, 2024, 6:11:03 PM (3 days ago) Jul 24
to Mike Shaver, Amir Omidi, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, dev-secur...@mozilla.org

Mike and Amir,

Here are some of the goals that come to my mind from the perspective of the Mozilla Root Program, followed by my short response concerning what to do with the current framework.

  1. Security and Privacy of Users: Our foremost goal, from Principle #4 of the Mozilla Manifesto, is to ensure the security and privacy of our users. This includes promoting the advancement and proper use of TLS technology to provide privacy and security.
  2. Operational Stability: Another critical goal is to maintain the stability of the internet, ensuring that our actions do not inadvertently cause widespread disruptions.
  3. Secure CA Operations: Ensuring that Certification Authorities (CAs) operate securely is paramount. Our goal is to work collaboratively with them as partners in securing the internet.
  4. CA Compliance with Continuous Improvement: We strive for a smooth-running CA program, focusing on proper remediation of CA compliance issues, so it’s not just about closing compliance bugs in Bugzilla. Improving CA transparency through better incident reporting processes is key to this goal. We also aim to improve the incident reporting process continually, encouraging disclosure and remediation in a way that benefits the entire community.

Currently, the 5-day revocation period is not working effectively, as evidenced by ongoing issues documented in Bugzilla. As I said before, I’d like to reach a consensus determination on what is best for the ecosystem. While I understand the argument for stricter revocation timelines, I believe there are broader considerations based on how this valuable TLS technology is currently being used to support healthcare, airlines, banking, etc. 

Contemporaneously with this discussion here, I plan to turn my attention to GitHub Issue #276 and start addressing the issue with better guidance in the wiki about reporting expectations and with new language (TBD) to be added to the Mozilla Root Store Policy. I also plan to be more proactive in commenting on CA compliance reports.

In summary, Mozilla's goals align closely with those of other root programs--maintaining control over CAs and minimizing their non-compliance while ensuring secure and effective CA operations.

Thanks, and keep the conversation going so that we can come to some consensus.

Ben

Matt Palmer

unread,
Jul 25, 2024, 1:47:52 AM (2 days ago) Jul 25
to dev-secur...@mozilla.org
On Wed, Jul 24, 2024 at 04:52:51PM -0400, Mike Shaver wrote:
> On Wed, Jul 24, 2024 at 2:36 PM 'Ben Wilson' via
> dev-secur...@mozilla.org <dev-secur...@mozilla.org> wrote:
> > Personally, I currently favor extending the timeframe for the revocation
> > of certificates that have no security impact,
>
> I propose, tongue only slightly in cheek, that if a component of the
> certificate doesn't have security impact, it be removed from the
> certificate specification in the BRs.

I'm not sure why you'd have your tongue anywhere near your cheek -- it's
an excellent proposal. The same question, of course, should apply to
the BRs as a whole, rather than just certificate contents, as there are
operational-related reasons for revocation, not just "the bits in the
cert are wrong".

> Are there any specific examples of what deviations of certificates would
> be deemed so minor that they can stay live on the web for 20 days, but
> still worthy of revocation?

My reading of Ben's suggestion was that anything that CAs can currently
take five days for would instead have a 20 day deadline, so it'd be any
of points 6-16 in BR s4.9.1.1.

While I can see the logic that leads to a suggestion of "let's give CAs
and subscribers more leeway on the little things", I don't think it
would be a win for the WebPKI. Revocation is already a "never" event
(as in, many subscribers don't think it could ever happen to them);
allowing CAs to reassure customers that not only won't it happen to
them, if it *were* to happen to them, they'd have 20 days to remedy it,
does not give any motivation to improve their certificate installation
practices. Which means that when, say, an employee uses the private key
of the company's TLS certificate as an example in a blog post[1], it'll
still take the company a couple of weeks to get around to replacing the
certificate.

Which is not what we want to encourage.

- Matt

[1] This is not a hypothetical:
https://www.hezmatt.org/~mpalmer/blog/2023/06/12/private-key-redaction-redux.html

Matt Palmer

unread,
Jul 25, 2024, 1:50:02 AM (2 days ago) Jul 25
to dev-secur...@mozilla.org
On Wed, Jul 24, 2024 at 12:45:37PM -0700, Amir Omidi (aaomidi) wrote:
> This is going to be pretty controversial too: I'd be in favor of removing
> the 5-day category altogether, and require a 24-hour revocation for all
> mis-issuances (probably as a step function, lowering the 120 hour time
> limit by 24 hours every 6 months or something until they're aligned?)

My Name is Matt Palmer, and I Approve This Message.

(cue fade to image of waving flag with patriotic music in the
background)

- Matt

Matt Palmer

unread,
Jul 25, 2024, 2:51:23 AM (2 days ago) Jul 25
to dev-secur...@mozilla.org
On Wed, Jul 24, 2024 at 07:49:32PM +0000, Jeremy Rowley wrote:
> > 1. What is the motivation for an organisation to take the time and effort
> to identify all problematic certificates? These organisations apparently
> don't have the available resources to fix the current problems, what will
> their reaction be to being asked to do even more work?
>
> I think you'd find organizations willing to admit this information upfront
> if it was the only way to delay revocation past the required timeframe.

But I don't see this as being the case in this proposal. There's
nothing that changes the status quo with regards to CA decision making
when it comes to deciding whether to delay revocation on other
(non-nominated) certificates.

If a CA were to, say, make a binding commitment that, in the event of
delayed revocation of non-nominated certificates, the CA would pay
$10,000 per certificate-hour to a specified charity or else voluntarily
remove themselves from all root stores, *that* would be something worth
paying attention to, as a real commitment that the CA cared to push the
idea forward.

> I think disclosure is cost of not being able to revoke in 5 days. I
> definitely agree with you that disclosing this information would be hard to
> make happen, but the cost to set up the experiment is pretty low.

I agree, the cost to setup the experiment is pretty low -- and it also
doesn't require ecosystem-wide consensus. A forward-thinking CA could
perform the experiment on their own: contact all their customers, stating
that, in the event of a need to revoke, the CA won't even consider
taking the hit and delaying revocation unless the customer has
previously nominated the impacted certificate as being potentially
problematic, along with the explanation for why. Publish that data,
raw, with serial numbers filed off, for analysis by the community.

This would allow said forward-thinking CA to gather the data that Tim's
proposal suggested (and I agree) would be useful, and would, I presume,
look good in the event that the CA *did* have to do a delayed revocation
("see, at least we *tried*!").

- Matt

Roman Fischer

unread,
Jul 25, 2024, 4:11:01 AM (2 days ago) Jul 25
to dev-secur...@mozilla.org

Dear Ben,

 

Thanks for your effort to re-ignite the discussion.

 

Personally, not speaking as a representative of my employer, I suggest two things to balance the interests:

  1. Remove the 3rd section "Mozilla recognizes…" from https://wiki.mozilla.org/CA/Responding_To_An_Incident . This would IMHO clarify that no exceptions are allowed.
  2. Introduce a third deadline of 15 (or 20) days in addition to the 24h and 5days deadlines in TLS BR 4.9.1.1 to cover certificates that were correctly validated, have all the right technical attributes (OID, key length, EKU, …) but don't 100% comply with the TLS BR or the CAs CP/CPS.

 

Finding good and workable solutions is often a give-and-take from all involved parties. Maybe by taking away the leeway for exceptions but giving the 3rd deadline is something that leaves all sides both happy and unhappy enough to accept it? 😉

 

Rgds

Roman

 

From: 'Ben Wilson' via dev-secur...@mozilla.org <dev-secur...@mozilla.org>
Sent: Mittwoch, 24.
Juli 2024 23:06
To: dev-secur...@mozilla.org
Cc: Mike Shaver <mike....@gmail.com>; Matt Palmer <mpa...@hezmatt.org>; Tim Hollebeek <tim.ho...@digicert.com>; Amir Omidi (aaomidi) <am...@aaomidi.com>; Wayne <rdau...@gmail.com>; Jeremy Rowley <jeremy...@digicert.com>
Subject: Re: Feasibility of a binding commitment to revoke before issuance

 

Thanks, everyone, for keeping this conversation going. It's essential that we continue because I believe the current framework is unworkable.

Ben

--

You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.

Jesper Kristensen

unread,
Jul 25, 2024, 1:35:31 PM (2 days ago) Jul 25
to dev-secur...@mozilla.org

+1 to what Amir wrote.

Many CAs use the 5 day deadline to give the subscriber 5 days to replace their certificates. I don't think that is what the 5 days are for. Some incidents are obvious, and CAs should therefore be able to revoke in 24 hours, but others are less obvious. CAs may sometimes need time to determine if the certificate was misissued or not, and if it is misissued, find out how to fix the problem, deploy the fix, and find all the other certificates that suffer from the same misissuance. I think this is why they need to have 5 days to cover the edge cases.

Instead of giving CAs more time to revoke, maybe Mozilla could adopt some of Chrome's Moving Forward, Together initiatives that push towards supporting agility (and therefore fast certificate replacement) like max. 90 day end-entity certificates, lower validity periods for subordinate CAs, and require CAs to support ACME and ARI.

As a concrete proposal (which I admit might not be fully thought through), Mozilla could add a requirement that when a CA delays revocation because a subscriber requested a delay, then every certificate that the CA issues for the next two years that shares a SAN identifier with the delayed revocation certificate must have a validity of at most 30 days. This would incentivise subscribers who failed to design their systems for agility, while it should not be a big burden for subscribers who just had a bad day because their automation failed them once.

Some large cloud providers (e.g. Cloudflare) have backup certificates from multiple CAs that they can use in case there is a problem with one CA without having to reissue a new certificate first. Maybe we should promote that as a best practice? I am not sure how many off-the-shelf ACME clients have support for this.

Maybe we don't need new policies but more resources to review bad incident reports, and sanctions other than distrust when CAs refuse to deliver. If I remember correctly, in the last six months Digicert is the only CA who at least attempted to comply with "the rationale must be provided on a per-Subscriber basis."

--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.

Amir Omidi

unread,
Jul 25, 2024, 2:49:27 PM (2 days ago) Jul 25
to Ben Wilson, Mike Shaver, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, dev-secur...@mozilla.org
Hi Ben,

Thank you for outlining your view on the current problems. I think we're probably all in agreement that the current 5-day revocation period is not working effectively. To understand what's going on, we may need to treat this as a meta-incident. For example, what are the timelines involved here, what was the situation like in the past, and what is the root cause of these problems?

I fear that we're proposing action items without really understanding what has gone wrong. I do want to challenge the conclusion that the reason this is not working is definitely because of the 5-day revocation rule. The vast majority of the recent delayed revocation incidents would have still been delayed revocation incidents even if the period was extended to 20 days.

Here are a couple of observations I've had that may help with the analysis here:
  • This problem is not affecting every CA equally, and there does not seem to be a correlation between percentage of total issuance and delayed revocation incidents.
  • This problem seems to mainly be impacting OV and EV certificates. CAs that primarily issue DV certs have a much easier time getting revocations done on time.
  • Up until the recent distrust of Entrust by the Chrome Root Program, there has been no incentive for CAs to actually follow the rules. If I were a CA, I'd personally have a hard time justifying following the BRs when I could just tell root programs my customers are special.

Am I also to understand that, as we're in the process of figuring out what to do here, the 5-day revocation rule is effectively on pause from the Mozilla Root Program perspective?

Peter Harris

unread,
Jul 25, 2024, 4:51:10 PM (2 days ago) Jul 25
to dev-secur...@mozilla.org, Tim Hollebeek
On Monday, July 15, 2024 at 5:22:30 p.m. UTC-4 Tim Hollebeek wrote:

The world would be better if we all knew, IN ADVANCE, which certificates are automatically replaceable, and which aren’t.  This would also greatly streamline operations when replacements are necessary, as it removes the burden on making the determinations with a ticking clock, which is a situation that doesn’t lend itself to careful and unbiased evaluations.

The information could even be contained in a certificate extension, so that the rotation practices of these organizations is transparent.  

This sounds like a great idea, provided that browsers only trust certificates with the "never revoke" extension if the duration of the certificate is sufficiently short. I was thinking 2 weeks, but maybe the 20 days proposed by Ben? Definitely less than 30 days.

If this idea is adopted, Mozilla (and other root store operators) can confidently add any misissued certificates without this extension to CRLite (or equivalent) 24 hours[1] after they are reported, regardless of whether the CA agrees to revoke by the deadline.

One potential downside is that this would make critical certificates stick out like a sore thumb, but I think on balance, the transparency is more valuable than the disclosure risk.  I’ve never been a huge security by obscurity fan, anyway.

Agreed. The two hurdles I see are convincing the CABF to allow irrevocable certificates in the BR, and convincing customers who use certificates on critical infrastructure to use certificates with a shorter duration.

I did a bit of digging, and imagine my surprise to find out that the CABF already[2] incorporated this idea into the BRs as "Short‐lived Subscriber Certificates". So make that only one hurdle. The good news is that CAs can start offering these "irrevocable" certificates to customers today; there is no need to wait for any further BR changes or root store policy updates.

Peter Harris

[1] Assuming Mozilla et al adopt Amir's proposal to disallow the 5-day option

Ben Wilson

unread,
Jul 25, 2024, 4:53:18 PM (2 days ago) Jul 25
to Amir Omidi, Mike Shaver, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, dev-secur...@mozilla.org

Thanks, Amir,

I appreciate your challenge to the assumption that the 5-day rule is the primary reason why we are where we are. We recognize that this is a difficult area, and many CAs have fallen short. Let's explore whether to create a meta-incident, but the more important part is that we come up with a meta-solution. As part of the effort, we should continue looking at the root causes to gain a more comprehensive understanding of the issues at play.

Here are my thoughts on the observations you’ve provided:

That this problem is not affecting every CA equally suggests that there are other variables at play. We ought to look at the culture of compliance, the nature of their customer bases, and the CAs' relationships with their customers, among other variables, to understand these differences, which will then help us to identify better solutions.

We agree that OV and EV certificates are more affected, in part, because the primary causes of misissuance are mistakes in the additional fields that such types of certificates they contain, and perhaps the customers who use them.

We agree that CAs should not perceive that they can bypass the rules without consequence. We need to ensure that there are clear and consistent consequences for non-compliance, and the rules to achieve and maintain compliance need to be more clear.

I agree that the vast majority of the recent delayed revocation incidents would have still been delayed revocation incidents even if the period was extended to 20 days. However, I am hoping that a 20-day timeframe, along with an effort to phase out most of the excuses (by requiring quicker and more specific disclosures from CAs and their subscribers about reasons for delay), will reduce the scale of this issue. This effort will need collaboration by CAs and root stores. We should explore treating a failure to pre-disclose the required information, publicly, as a key focus of delayed revocation Bugzilla filings.

Finally, Mozilla also believes that automation, both in issuance and in replacement and revocation, is a path forward, but we need to move more in that direction first in the short term before it can become a long-term solution.

Also, regarding your final question, this discussion does not pause the BR requirement, but in dealing with CAs we shouldn't disregard the complexities of the issues presented.

Thanks again,

Ben



Suchan Seo

unread,
Jul 25, 2024, 11:29:33 PM (2 days ago) Jul 25
to dev-secur...@mozilla.org, Ben Wilson, Mike Shaver, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, dev-secur...@mozilla.org, Amir Omidi
Would we want something between full revoke and left it as is? like strip EV/OV data in malformed certificate make sense? not fully revoke but treat it as DV (not showing other data in certificate, warn user when client download/view parsed certificate)
is there things that actually reads from OV/EV data? or OCSP infomation that amends certificate (changed fields and new sign with modified fields)

2024년 7월 26일 금요일 오전 5시 53분 18초 UTC+9에 Ben Wilson님이 작성:

Tim Callan

unread,
Jul 26, 2024, 9:26:28 AM (23 hours ago) Jul 26
to dev-secur...@mozilla.org, Suchan Seo, Ben Wilson, Mike Shaver, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, dev-secur...@mozilla.org, Amir Omidi

Based on what we observe in recent and current delrev incidents, it defies belief that any strategy involving categorizing certificates into those that warrant immunity from the revocation rules will be accurate, fair, or in the best interest of the WebPKI.  Over the past four months we have watched more than a dozen public CAs choose to delay revocation of unambiguously misissued certificates for vast periods of time ranging into the span of months.  Many of these incidents involved delayed revocation for the majority of the affected certificates.  The CAs offer the flimsiest of excuses, or make no attempt at excuses at all.  They drag out the same, tired comments about lengthy approval processes and prohibitive regulations.  They use holidays, weekends, vacations, and end of quarter as excuses.  They say these systems are critical until it turns out that would be against the CPS in effect, at which point conveniently these systems are NOT critical anymore.

Any parent quickly learns to detect when you’re being handed a line, and Bugzilla is being handed lines left and right.  Most of these CAs don’t even display the creativity to make up their own bad fabrications and instead simply crib bad fabrications from those who have come before.  The poster child here is the obligatory misrepresentation of Mozilla’s delayed revocation policy.  This policy emphasizes that CAs are expected to follow the BR revocation deadlines every time, but CA after CA conveniently omits that part of the policy as they wave their hands around why this is not blatant disregard for the rules, as if the rest of us somehow lack the ability to look up the original policy to read and understand what it says.

It’s a sad state for the set of companies that supposedly are the guardians of public identity and the security of the WebPKI.

Regretfully, I for one have come to the conclusion that we cannot rely on Subscribers and their CAs to fairly categorize certificates into those qualifying for some kind of extended revocation timeline and those that do not.  If we are to take reporting CAs at their word, then we know Subscribers have a propensity to actively lie to CAs or omit the facts if they see it being to their advantage in gaining a revocation delay.  Or the more disturbing thought is that the CAs themselves are omitting, or lying, or coaching their Subscribers on how to omit and lie so that they can reliably delay revocation of certificates.

The fact is that Subscribers actually are able to replace these certificates on time.  They simply don’t want to.  It’s a hassle.  It takes time away from other projects.  It messes with their evenings and weekends.  Sometimes it costs extra budget.  Hell, if they can delay it long enough, a nontrivial number of certs will expire on their own and won’t require revocation at all. So, when given the opportunity to represent their processes and systems as incapable of agile certificate replacement, Subscribers sing that tune.  When given the opportunity to explain the results of forced on-time revocation as disastrous, they sing that tune also.

CAs, likewise, are motivated the wrong way. They can be sticklers and make their paying customers angry at them, or they can be lenient and become heroes in the customers’ eyes.  This can be a powerful temptation, and we have seen CAs succumb over and over again.

Perhaps if we could create some kind of objective, perfect, fair, consistent, and externally measurable criteria for certificate use cases and circumstances warranting a revocation delay, then those rules could be enacted for all CAs to follow equally.  But I don’t see any credible candidates for these criteria, have never heard a legitimate proposal for such a thing, and do not believe it is possible.

Making CA opinion the basis for judging which certificates deserve a deadline extension is also unworkable.  We have the abovementioned problems with Subscriber and CA credibility. Additionally, there is simply no way a CA has the visibility and detailed operational knowledge needed to genuinely evaluate a Subscriber’s ability to swap out certificates in a given timeframe and the consequences of failure to do so.

There is, however, an organization that is intimately familiar with the Subscriber’s processes and abilities and the consequences of missing certificate replacement on time.  This organization is capable of making risk/reward tradeoffs for certificate agility versus other initiatives and can enact real-time resource and process adjustments to deal with unforeseen revocations.  That organization is, of course, the Subscriber whose certs are up for revocation.

Subscribers who learn their certificates will be disappearing at a specific time roughly 100 hours from now always seem to have the new certificates installed before the old ones die.  Always.  We know this at Sectigo because for the past few years we have not entertained delayed revocation requests by any Subscriber in any environment for any reason.  We simply let them know when their certificates will stop working and focus on helping them obtain and install replacements.  And I personally believe, unless it becomes codified as an exception policy in the relevant regulations, that Sectigo will never entertain the idea of purposefully delaying revocation again.

I firmly believe this to be the only viable path forward.  We need to abolish the deliberate delay of mandated revocations. 

Removal of deliberate delrevs serves the WebPKI in many ways.

- It maintains a clean and compliant certificate base for Relying Parties to trust.

- It increases motivation for CAs to strive for error-free operations.

- It encourages automation and certificate agility among Subscribers, who know they won’t be able to talk their way out of a revocation event.

- It is consistent, fair, simple to understand, and easily measured.

- It teaches CAs and anyone else watching the WebPKI that the rules matter and must be followed.

- It eliminates counterproductive motivators influencing CAs today.

- There is a clear path to success that every CA has the technical and procedural ability to execute.

Of course, a reading of the Baseline Requirements and the major root store program guidelines will reveal this requirement today.  However, we are missing meaningful, reliable consequences for failure to comply.  In each of these cases the CA believed the negatives of transparent disobedience to the BRs to be less than the negatives of completing the revocation.  Otherwise they would not have delayed. 

And with few exceptions they were probably right.  Most of the CAs with willful delrev incidents from the past four months, or the past four years, will not face distrust as a direct result.  And in an ecosystem with no other penalty for noncompliance, this means most CAs are pragmatically motivated to appease their paying customers – or their bosses in the larger organizations that own them – even at the expense of the WebPKI.  Right now, the penalty is that you have to write up a Bugzilla incident and answer uncomfortable questions from a few nosy jerks for a couple of months until everyone gives up in frustration and lets you close the bug.  In many cases, the CA judges this to be considerably less painful than angering one or more of the Subscribers that keep the CA operational.

We need root programs to enforce these rules with enough power to tip the decision-making scales.  We need CAs to dread the consequences of delayed revocation more than they dread Subscriber displeasure.  That has to mean either 1) that the likelihood of root distrust goes up dramatically among one or more major root programs or 2) that the WebPKI comes up with some kind of alternative consequence.  This consequence would have to be painful enough to seriously demotivate intentional delay but not so severe that browsers are unwilling to use it.

Ben Wilson

unread,
Jul 26, 2024, 12:13:45 PM (20 hours ago) Jul 26
to Tim Callan, dev-secur...@mozilla.org, Suchan Seo, Mike Shaver, Jeremy Rowley, Matt Palmer, Tim Hollebeek, Wayne, Amir Omidi

All,

In addition to the ideas stated in my previous email, I’d like to get your thoughts on some of the steps we can take while we discuss the current 5-day revocation requirement in BR 4.9.1.1. 

We can, and should, develop more stringent disclosure requirements that compel CAs to provide advance descriptions of the circumstances under which they cannot revoke a certificate within the required timeframe. My proposal is that we modify Mozilla's guidance on delayed revocation. We will preserve that statement that “Mozilla does not grant exceptions to the BR revocation requirements.” The revisions would mandate full disclosure in advance so that the community can evaluate the CA's and subscriber's arguments for delayed revocation. Also, there have been too many instances where CAs have failed to include all the necessary information in preliminary delayed revocation incident reports. Moving forward, and for existing delayed revocation bugs, CAs would need to closely follow the updated instructions, which would require specificity when claiming exceptional circumstances, significant harm, critical infrastructure, and all would be on a per-certificate basis rather than on a per-subscriber basis.  Moreover, CAs would be required to attest that they have communicated with, or will shortly communicate with, their auditors, supervisory bodies (if applicable), and all Root Stores that they participate in to indicate that they have begun a process to analyze the risk and formulate a remediation plan to address delayed revocation. This comprehensive approach would ensure that we are not just relying on CA opinion but are creating a structured, transparent process that the entire community can trust and verify.

Thoughts?

Ben

Wayne

unread,
Jul 26, 2024, 12:26:26 PM (20 hours ago) Jul 26
to dev-secur...@mozilla.org
Hi Ben,

I believe that the alteration to the guidance on delayed revocation is a bit mixed. We're preserving that Mozilla does not grant exceptions, but continuing to outline a method for exceptions to be permitted. History has shown this will only be used to generate excuses for too much workload, and that merely generating compliance paperwork is counter-productive to subscriber-communication and revocation in a timely manner.

While I appreciate the additional caveats that are being added, I would suggest that the lack of adherence to those currently in place is the higher priority. When such changes are published, how long will CAs have to adapt to this new policy? If we are unsure if they are capable of complying with the simpler policy as-is, surely a more complex one is going to generate a worse compliance rating?

Note that I am not advocating that we pursue policy based on whether most CAs will comply with them as that is how we get bad policy. We should instead look at what the outcomes we want - timely revocation, a stricter standard to documenting delayed revocation, guarantees that no future incidents will be a repeating pattern. If we take those as a simple framework, then surely focusing on enforcement to that end would be more prudent? Otherwise I fear we are hoping that changing minor wording will ensure the CAs are competent in their activities going forward.

As far as any enforcement mechanisms, is a public record card out of the question in the interim? A simple catalogue of missed question timelines, quantity of missed certificate by key points, how often the CCADB/Mozilla incident policy must be quoted each incident? If a CA wishes to show their adherence then having a publicly documented reputation will shift the incentives too. I don't expect a thorough guidance and set of rules here, but a simple record will be better than the state we have now - especially for assessing prior issues against CAs.

Likewise, a CA should not view having their history and prior failures documented as the end of the world. Much like the incidents themselves, they're a list of problems that have happened and show every CA where steps must be improved as an industry. I would instead counter that having no such documentation after a few years as a CA is more grounds for concern. No one is perfect, and pursuing policy to that end is going to encourage a culture of hiding faults when we need to have them documented as thoroughly and publicly as possible.

- Wayne

Ben Wilson

unread,
Jul 26, 2024, 12:52:13 PM (19 hours ago) Jul 26
to Wayne, dev-secur...@mozilla.org

Thanks, Wayne.

We understand that it might appear we are perpetuating a misunderstanding that CAs and subscribers can claim exceptions. However, our intention is to phase out this approach over time while we gather more information to inform our decisions. We believe this effort will lead to stricter requirements and the eventual elimination of delayed revocations.

Thanks again,

Ben


--
You received this message because you are subscribed to the Google Groups "dev-secur...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-security-po...@mozilla.org.

Mike Shaver

unread,
Jul 26, 2024, 1:09:11 PM (19 hours ago) Jul 26
to Ben Wilson, Wayne, dev-secur...@mozilla.org
Why do you (plural?) believe that? What evidence is there that increasing the revocation window will lead to any stricter requirements?

Did revocation get faster when the deadline was extended the last time?

I agree with Tim wholeheartedly here. The delayed revocation issues are ones that stem from poor management the certificate authorities in the Mozilla store, and if they managed their operations better (including ensuring that their subscribers actually know what legal commitments they’re making, which many have claimed that they do not!) then they would be able to meet the revocation deadline, which was already extended, without difficulty.

CAs and subscribers who misissue, or misuse the web PKI (such as for web-critical infrastructure), *MUST* be the ones who bear the costs of those misissuances and misuses, or the situation will never improve. The proposal to extend the revocation deadline legitimizes ridiculous complaints of poorly operated CAs, and gains nothing for the fragile integrity of the web PKI. It is counter to the interests of Firefox’s users, and all users of the web.

There is no reason other than CA and subscriber convenience to make such an extension, and the web has already suffered too much from indulgence of CAs — who have the root password to all of the web’s security — who have shown that they are not taking their roles seriously enough. This laxity allowed Entrust to operate incompetently for FOUR YEARS without any meaningful pushback from the Mozilla root program on their repeated failure to meet the commitments to the root program, until unrelated community members decided to volunteer time to uphold Mozilla’s own standards in Mozilla’s own forum. Entrust was defended as having been “responding appropriately” even months into that most recent exposure of their failures to meet Mozilla’s policies.

This is not the time to carry more water for CAs who don’t want to be held responsible for operating correctly with their incredible, practically-unchecked power. It is the time for Mozilla’s root program to decide if it wants to be taken seriously as a protector of Firefox’s users, and if it will give any reason for CAs to actually follow the rules (absent, or even *following* the actions of other root programs). Even Entrust is still trusted fully in the Mozilla root store, without any comment from Mozilla about the (incredibly, obviously terrible) series of reports that were submitted.

*If* there is to be value in permitting misissued certificates to linger for more time on the web, then I submit that this is not the time to do it, and Mozilla is not who should be championing it. Mozilla has an influence on the web that is outsized with respect to Firefox’s marketshare, and pushing this direction will only diminish it.

Mike

Walt

unread,
Jul 26, 2024, 1:26:07 PM (19 hours ago) Jul 26
to dev-secur...@mozilla.org, Mike Shaver, Wayne, dev-secur...@mozilla.org, Ben Wilson
All,

I would have to second the opinions of Mike and Wayne. It is very very hard to unring that bell, especially given the perverse financial incentive CAs have against following the rules.

A CA that is lax on revocation with enterprise customers is more likely to retain customers than a CA that follows the rules, because the one that follows the rules is going to have their account managers and CSMs fielding questions like "we pay you how much and you couldn't run air cover for us?" (There's a separate conversation to be had on whether or not there's even any point in selling or buying an EV certificate in 2024 other than getting wined and dined by a vendor but that's neither here nor there.)

We see this delayed play out among "Enterprise" CAs, where they're having custom paper or off the record promises by a salesperson, and we don't see it in b2c contexts. Why should enterprises get more wiggle room, when they're the ones that should have better controls and processes in place for a security incident. The only real power a root program has in this ecosystem is distrust. If a CA is acting in bad faith, they should not be given more leeway, they should be given less. 

We see numerous cases of having to pull accurate answers out of CAs like pulling teeth, dodging the 7 day answer period, simply refusing to do the most basic principles of the root program agreement. If a child is being a brat, the way to get them to behave is to not give them a longer leash, it's instead to put them on a shorter leash. I would also argue that:

Currently, the 5-day revocation period is not working effectively, as evidenced by ongoing issues documented in Bugzilla. As I said before, I’d like to reach a consensus determination on what is best for the ecosystem. While I understand the argument for stricter revocation timelines, I believe there are broader considerations based on how this valuable TLS technology is currently being used to support healthcare, airlines, banking, etc. 

Is not entirely accurate, as certain CAs have managed to revoke millions of certificates and replace two thirds of them in 24 hours more or less. The way to make the 5 day revocation period work effectively is to make there an incentive to do it properly, not by extending the time available to revoke (as we'll simply end up in the same boat on a 20 day period). The incentive in this case is simply to enforce the powers of the root program. Once it's made clear that a root program is willing and able to enforce stronger restrictions (validity period, so on)  and/or distrust a CA, it will become common knowledge across the industry that CAs shouldn't be offering promises that they shouldn't be keeping, but have been able to do due to inconsistent enforcement. 

All a 20 day period would do is make it seem even less urgent, and continue to be kicked down the road by subscribers and CAs.

Additionally, if we follow the CIA triad, having your certificate revoked is absolutely a security incident (availability), and if an enterprise can revoke/replace within 5 days due to a key compromise, they can absolutely replace due to needing to replace due to a typographical error. 

Cheers, 

Walter

Ben Wilson

unread,
Jul 26, 2024, 1:54:15 PM (18 hours ago) Jul 26
to Walt, dev-secur...@mozilla.org, Mike Shaver, Wayne
Thank you all for the feedback on these ideas, we'll take some time to reflect on the issues and think through how to best address them.
Ben

Amir Omidi

unread,
Jul 26, 2024, 4:42:37 PM (16 hours ago) Jul 26
to Ben Wilson, Walt, dev-secur...@mozilla.org, Mike Shaver, Wayne
As part of this I'd also be interested in what enforcement mechanisms you're also thinking of. Say, the community adopted the 20 day revocation category - what happens if a CA breaks that rule?

Matt Palmer

unread,
Jul 26, 2024, 11:33:21 PM (9 hours ago) Jul 26
to dev-secur...@mozilla.org
On Fri, Jul 26, 2024 at 10:13:31AM -0600, Ben Wilson wrote:
> In addition to the ideas stated in my previous email, I’d like to get your
> thoughts on some of the steps we can take while we discuss the current
> 5-day revocation requirement in BR 4.9.1.1.
>
> We can, and should, develop more stringent disclosure requirements that
> compel CAs to provide advance descriptions of the circumstances under which
> they cannot revoke a certificate within the required timeframe. My proposal
> is that we modify Mozilla's guidance on delayed revocation
> <https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation>. We will
> preserve that statement that “Mozilla does not grant exceptions to the BR
> revocation requirements.” The revisions would mandate full disclosure in
> advance so that the community can evaluate the CA's and subscriber's
> arguments for delayed revocation.

I don't agree with wording it as "full disclosure in advance so that the
community can evaluate", because that could be read as suggesting that
Mozilla and/or the community are somehow "judging" reasons, and can
provide some sort of "injunctive relief" to delay revocation. Instead,
by my understanding, the intention is to gather more useful information
about why revocation is sometimes delayed, for dissemination both to
other CAs and their subscribers (so as to inform operational and
organisational practices), and to the WebPKI community, to provide input
into policy making. I don't have any specific wording suggestions, but
I think changing the wording to more accurately emphasise the
"information gathering" purpose would be an improvement.

Also, I'd like that information be gathered not just when full-blown
incidents occur, but also in what could be considered a kind of "near
miss". Given that all of 4.9.1.1's revocation reasons are, at a
minimum, "SHOULD revoke within 24 hours", with a subset of reasons being
"MUST revoke within five days", if we want to gather information about
why revocations are delayed, why not make it a requirement of the
Mozilla root program that *any* revocation that takes more than 24 hours
requires reporting, and any revocation beyond the 5 day "MUST" of
4.9.1.1 be a full-blown, auditor-reportable incident?

We want information, and a representative of one CA seems to want CAs to
provide that information, and given that a "SHOULD" means "the full
implications must be understood and carefully weighed before choosing a
different course", CAs SHOULD (heh) already have the necessary
information at hand before they make the decision to delay past 24
hours, thus there should be minimal additional imposition on CAs to
report that information to the WebPKI community.

> Also, there have been too many instances
> where CAs have failed to include all the necessary information in
> preliminary delayed revocation incident reports. Moving forward, and for
> existing delayed revocation bugs, CAs would need to closely follow the
> updated instructions, which would require specificity when claiming
> exceptional circumstances, significant harm, critical infrastructure, and
> all would be on a per-certificate basis rather than on a per-subscriber
> basis.

I fully support this, and would furthermore make the guidance
*extremely* specific, something like this:

For each affected certificate, the customer's description of precisely
why the certificate cannot be replaced within the 24 hour timeframe
advised by the Baseline Requirements, and if there are external factors
involved in that decision, a reference to publicly-available normative
documentation describing those factors in detail. For internal factors
that impact on the inability to revoke, details provided must be
sufficient to allow an independent individual reasonably versed in the
administration of IT systems to fully understand the operational and
organisational failures which led to the inability to revoke in a timely
manner.

I'm not 100% happy with the "internal factors" part of that description,
as it is still somewhat open to (creative mis)interpretation, but I
think it's a more objective standard than what is currently provided.

Another thought has just come to mind, regarding the issue that others
have raised around encouraging a "race to the bottom", with lax CAs
gaining business over their more rigorous competitors, and relating to
the proposal put forward by Tim Hollebeek that started this thread: that
if a certificate is not revoked within the mandatory limits set forth in
the BRs, AND that certificate was not previously publicly disclosed as
being "problematic for revocation", then no further certificates may be
issued for any eTLD+1 contained within that certificate which chains to
a trusted root under the control of the same organisation which
controlled the root that the delayed-revocation certificate chained to.

This is, I agree, a fairly drastic measure, however it is less drastic
than what is, really, the only other stick Mozilla has: total distrust.
It would give CAs a *big* talking point to push back on customers which
demand the CA carry their water, specifically "if we delay revocation,
you'll have to find a new CA to work with to get those new certs, which
is almost certainly going to be more of a pain in the arse for you than
having to scramble internally and do whatever you need to do to replace
those certs with new ones we can give you". It also aligns CA financial
interests with that of the community, because there's no longer any
reason to play fast-and-loose with the BRs to keep the customer, because
playing fast-and-loose *guarantees* you'll lose at least some revenue,
because you can't issue that customer certificates any more (for the
problematic domains, at least).

This is also a control whose violation *can* be externally detected (via
CT logs), and even programmatically enforced if necessary, and thus it
is more effective than purely administrative-level controls.

- Matt

Reply all
Reply to author
Forward
0 new messages