Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Incident report D-TRUST: syntax error in one tls certificate

901 views
Skip to first unread message

Enrico Entschew

unread,
Nov 23, 2018, 10:25:01 AM11/23/18
to mozilla-dev-s...@lists.mozilla.org
This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1509512

syntax error in one tls certificate

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

We became aware of the issue via https://crt.sh/ on 2018-11-12, 09:01 UTC.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Timeline:
2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error in one tls certificate issued on 2018-06-02. The PrintableString of OBJECT IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more details see https://crt.sh/?id=514472818
2018-11-12, 09:30 UTC CA Security Issues task force analyzed the error and recommended further procedure.
2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an international critical trade platform for emissions. Immediate revocation of the certificate would cause irreparable harm to the public.
2018-11-12, 13:00 UTC We performed a dedicated additionally coaching on this specific syntax topic within the validation team to avoid this kind of error in the future.
2018-11-16, 08:40 UTC Customer responded first time and asked for more time to evaluate the certificate replacement process.
2018-11-19, 12:30 UTC CA informed the auditor TÜV-IT about the issue.
2018-11-20, 15:19 UTC Customer declared to replace the certificate on 2018-11-22 latest.
2018-11-22, 15:52 UTC New certificate has been applied for and has been issued.
2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

The CA has not stopped issuing EV-certificates. We applied dedicated coaching on this specific syntax topic within the validation team to avoid this kind of error until software adjustments to both effected systems have been completed.

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

1 Certificate
SHA-256 41F3AD0CBDA392F078D776FD1CDC0E35F7AF61030C56C7B26B95936F41A83B32
Issued on 2018-06-01

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

For more details see https://crt.sh/?id=514472818

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

This problem was caused within the frontend system to the customer and the lint system. Both systems did not check the entry in the field of serialNumber (2 5 4 5) correctly. It was possible to enter characters other than defined in PrintableString definition.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

The CA Security Issues task force together with the software development analyzed the error. We applied dedicated coaching on this specific syntax topic within the validation team to avoid this kind of error until software adjustments to both effected systems have been completed. The changes in the systems are expected to go live in early January 2019.

Thank you
Enrico Entschew
D-TRUST GmbH

Paul Léo Steinberg

unread,
Nov 25, 2018, 5:13:42 PM11/25/18
to mozilla-dev-s...@lists.mozilla.org
> 2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error in one tls certificate issued on 2018-06-02. The PrintableString of OBJECT IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more details see https://crt.sh/?id=514472818
> 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an international critical trade platform for emissions. Immediate revocation of the certificate would cause irreparable harm to the public.
> 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.

Going forward, if the platform is that important, have you advised the customer to have a second certificate from a different CA (with a different key) ready for emergencies? Being too big to fail seems like a really lame excuse for not planning ahead.

Additionally, if the platform's operation is critical, it would seem to be a good idea to apply an even stricter standard of security than mandated by the BR than to laxen it (revocation in more than 10 days instead of less than 24 hours). E.g., it seems also like a bad idea, though permitted by BR, to issue certificates with a lifetime of around 2 years to such a service, while MUCH shorter lifetimes would seem more appropriate.

If, on the contrary, your are arguing that availability is more important than security, operating the service over unencrypted HTTP would be wise.

Gijs Kruitbosch

unread,
Nov 26, 2018, 5:06:23 AM11/26/18
to Enrico Entschew
(for the avoidance of doubt: posting in a personal capacity)

On 23/11/2018 15:24, Enrico Entschew wrote:
> Timeline:
> 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an international critical trade platform for emissions. Immediate revocation of the certificate would cause irreparable harm to the public.
<snip>
> 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer.

Some questions I have:

1) Don't the BR specify CAs MUST revoke within 24 hours (for some
issues) or 5 days (for others)? This looks like just over 10 days, and
was customer-prompted as opposed to set by the CA, it seems. Am I just
missing the part of the BRs that says ignoring the 5 days is OK if it's
"just" a syntax error?

2) what procedure does D-TRUST follow to ensure adequate revocation
times, and in particular, under what circumstances does it decide that
not revoking until the customer gives an OK is necessary (e.g. how does
it decide what constitutes an "international[ly] critical" site)? Is
this documented, e.g. in CPS or similar? Have auditors signed off on that?

3) can you elaborate on the system being down causing "irreparable
harm"? What would have happened if the cert had just been revoked after
24/120 hours? In this case, the website in question ( www.dehst.de ) has
been broken in Firefox for the past 64 or so hours (ie since about 6pm
UK time on Friday, when I first read your message) because the server
doesn't actually send the full chain of certs for its new certificate.
Given that the server (AFAICT) doesn't staple OCSP responses, I don't
imagine that practical breakage in a web browser would have been worse
if the original cert had been revoked immediately, given the CRL
revocation done last week hasn't appeared in CRLSet/OneCRL either.

~ Gijs

Nick Lamb

unread,
Nov 26, 2018, 10:31:52 AM11/26/18
to Enrico Entschew, dev-secur...@lists.mozilla.org
In common with others who've responded to this report I am very skeptical about the contrast between the supposed importance of this customer's systems versus their, frankly, lackadaisical technical response.

This might all seem harmless but it ends up as "the boy who cried wolf". If you relay laughable claims from customers several times, when it comes to an incident where maybe some extraordinary delay was justifiable any good will is already used up by the prior claims.

CA/B is the right place for CAs to make the case for a general rule about giving themselves more time to handle technical non-compliances whose correct resolution will annoy customers but impose little or no risk to relying parties, I personally at least would much rather see CAs actually formally agree they should all have say 28 days in such cases - even though that's surely far longer than it should be - than a series of increasingly implausible "important" but ultimately purely self-serving undocumented exceptions that make the rules on paper worthless.

Jakob Bohm

unread,
Nov 26, 2018, 12:12:49 PM11/26/18
to mozilla-dev-s...@lists.mozilla.org
It should be noted that the counter-measures that some posts have
expected of the end-site in question may not always be realistic
(Speaking generally, as I have not data on the specifics of this end-
site):

1. Having a spare certificate ready (if done with proper security, e.g.
a separate key) from a different CA may unfortunately conflict with
badly thought out parts of various certificate "pinning" standards.

2. Being critical from a society perspective (e.g. being the contact
point for a service to help protect the planet), doesn't mean that the
people running such a service can be expected to be IT superstars
capable of dealing with complex IT issues such as unscheduled
certificate replacement due to no fault of their own.

3. Not every site can be expected to have the 24/7 staff on hand to do
"top security credentials required" changes, for example a high-
security end site may have a rule that two senior officials need to
sign off on any change in cryptographic keys and certificates, while a
limited-staff end-site may have to schedule a visit from their outside
security consultant to perform the certificate replacement.

Thus I would be all for an official BR ballot to clarify/introduce
that 24 hour revocation for non-compliance doesn't apply to non-
dangerous technical violations.

Another category that would justify a longer CA response time would be a
situation where a large batch of certificates need to be revalidated due
to a weakness in validation procedures (such as finding out that a
validation method had a vulnerability, but not knowing which if any of
the validated identities were actually fake). For example to recheck a
typical domain-control method, a CA would have to ask each certificate
holder to respond to a fresh challenge (lots of manual work by end
sites), then do the actual check (automated).



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Jakob Bohm

unread,
Nov 26, 2018, 12:34:38 PM11/26/18
to mozilla-dev-s...@lists.mozilla.org
In addition to this, would you add the following:

- Daily checks of crt.sh (or some other existing tool) if
additional such certificates are erroneously issued before
the automated countermeasures are in place?

- Procedurally (and eventually technically) restrict the serial number
element to actual validated identification numbers from a fixed set of
databases for each jurisdiction. For example for a Bundesamt, this
should be a special prefix followed by some kind of official
identifying number of entities within the Bundesvervaltung. Similar of
cause for Landesamts, companies etc.
Also, it is unclear why a Bundesamt belongs to an identification
jurisdiction lower than the entire BDR.
For comparison, Danish Company entities are fully identified by the
numeric part of their VAT number (or a number from the same national
database if it has no VAT registration), and this is what CAs put in
the serialNumber distinguished name element. Danish government
entities also have unique numbers from a different database, but I
haven't found a government certificate the serialNumber element.

Ryan Sleevi

unread,
Nov 26, 2018, 6:47:46 PM11/26/18
to Nick Lamb, enr...@entschew.com, MDSP
On Mon, Nov 26, 2018 at 10:31 AM Nick Lamb via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> CA/B is the right place for CAs to make the case for a general rule about
> giving themselves more time to handle technical non-compliances whose
> correct resolution will annoy customers but impose little or no risk to
> relying parties,
>

CAs have made the case - it was not accepted.

On a more fundamental and philosophical level, I think this is
well-intentioned but misguided. Let's consider that the issue is one that
the CA had the full power-and-ability to prevent - namely, they violated
the requirements and misissued. A CA is only in this situation if they are
a bad CA - a good CA will never run the risk of "annoying" the customer.

This also presumes that "annoyance" of the subscriber is a bad thing - but
this is also wrong. If we accept that CAs are differentiated based on
security, then a CA that regularly misissues and annoys its customers is a
CA that will lose customers. This is, arguably, better than the
alternative, which is to remove trust in a CA entirely, which will annoy
all of its customers.

This presumes that the customer cannot take steps to avoid this. However,
as suggested by others, the customer could have minimized or eliminated
annoyance, such as by ensuring they have a robust system to automate the
issuance/replacement of certificates. That they didn't is an operational
failure on their fault.

This presumes that there is "little or no risk to relying parties."
Unfortunately, they are by design not a stakeholder in those conversations
- the stakeholders are the CA and the Subscriber, both of which are
incentivized to do nothing (it avoids annoying the customer for the CA, it
avoids having to change for the customer). This creates the tragedy of the
commons that we absolutely saw result from browsers not regularly enforcing
compliance on CAs - areas of technical non-compliance that prevented
developing interoperable solutions from the spec, which required all sorts
of hacks, which then subsequently introduced security issues. This is not a
'broken windows' argument so much as a statement of the demonstrable
reality we lived in prior to Amazon's development and publication of
linting tools that simplified compliance and enforcement, and the
subsequent improvements by ZLint.

Conceptually, this is similar to an ISP that regularly cuts its own
backbone cables or publishes bad routes. By ensuring that the system
consistently functions as designs - and that the CA follows their own
stated practices and procedures and revokes everything that doesn't - the
disruption is entirely self-inflicted and avoidable, and the market can be
left to correct for that.


> I personally at least would much rather see CAs actually formally agree
> they should all have say 28 days in such cases - even though that's surely
> far longer than it should be - than a series of increasingly implausible
> "important" but ultimately purely self-serving undocumented exceptions that
> make the rules on paper worthless.
>

I disagree that encouraging regulatory capture (and the CA/Browser Forum
doesn't work by formal agreement of CAs, nor does it alter root program
expectations) is the solution here.

I agree that it's entirely worthless the increasingly implausible
"important" revocations. I think a real and meaningful solution is what is
being more consistently pursued, and that's to distrust CAs that are not
adhering to the set of expectations. There's no reason to believe the
"impact" argument, particularly when it's one that both the Subscriber and
the CA can and should have avoided, and CAs that continue to make that
argument are increasingly showing that they're not working in the best
interests of Relying Parties (see above) or Subscribers (by "annoying" them
or lying to them), and that's worthy of distrust.

Ryan Sleevi

unread,
Nov 26, 2018, 6:55:01 PM11/26/18
to Jakob Bohm, mozilla-dev-security-policy
On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> 1. Having a spare certificate ready (if done with proper security, e.g.
> a separate key) from a different CA may unfortunately conflict with
> badly thought out parts of various certificate "pinning" standards.
>

You blame the standards, but that seems an operational risk that the site
(knowingly) took. That doesn't make a compelling argument.


> 2. Being critical from a society perspective (e.g. being the contact
> point for a service to help protect the planet), doesn't mean that the
> people running such a service can be expected to be IT superstars
> capable of dealing with complex IT issues such as unscheduled
> certificate replacement due to no fault of their own.
>

That sounds like an operational risk the site (knowingly) took. Solutions
for automation exist, as do concepts such as "hiring multiple people"
(having a NOC/SOC). I see nothing to argue that a single person is somehow
the risk here.


> 3. Not every site can be expected to have the 24/7 staff on hand to do
> "top security credentials required" changes, for example a high-
> security end site may have a rule that two senior officials need to
> sign off on any change in cryptographic keys and certificates, while a
> limited-staff end-site may have to schedule a visit from their outside
> security consultant to perform the certificate replacement.
>

This is exactly describing a known risk that the site took, accepting the
tradeoffs. I fail to see a compelling argument that there should be no
tradeoffs - given the harm presented to the ecosystem - and if sites want
to make such policies, rather than promoting automation and CI/CD, then it
seems that's a risk they should bear and make an informed choice.

Thus I would be all for an official BR ballot to clarify/introduce
> that 24 hour revocation for non-compliance doesn't apply to non-
> dangerous technical violations.
>

As discussed elsewhere, there is no such thing as "non-dangerous technical
violations". It is a construct, much like "clean coal", that has an
appealing turn of phrase, but without the evidence to support it.


> Another category that would justify a longer CA response time would be a
> situation where a large batch of certificates need to be revalidated due
> to a weakness in validation procedures (such as finding out that a
> validation method had a vulnerability, but not knowing which if any of
> the validated identities were actually fake). For example to recheck a
> typical domain-control method, a CA would have to ask each certificate
> holder to respond to a fresh challenge (lots of manual work by end
> sites), then do the actual check (automated).


Like the other examples, this is not at all compelling. Solutions exist to
mitigate this risk entirely. CAs and their Subscribers that choose not to
avail themselves of these methods - for whatever the reason - are making an
informed market choice about these. If they're not informed, that's on the
CAs. If they are making the choice, that's on the Subscribers.

There's zero reason to change, especially when such revalidation can be,
and is, being done automatically.

Enrico Entschew

unread,
Nov 27, 2018, 12:17:11 PM11/27/18
to mozilla-dev-s...@lists.mozilla.org
Am Montag, 26. November 2018 18:34:38 UTC+1 schrieb Jakob Bohm:

> In addition to this, would you add the following:
>
> - Daily checks of crt.sh (or some other existing tool) if
> additional such certificates are erroneously issued before
> the automated countermeasures are in place?

Thank you, Jakob. This is what we intended to do. We are monitoring crt.sh at least twice daily every day from now on.

As to your other point, we do restrict the serial number element and the error occurred precisely in defining the constraints for this field. As mentioned above, we plan to make adjustments to our systems to prevent this kind of error in future.

Enrico Entschew

unread,
Nov 27, 2018, 3:47:16 PM11/27/18
to mozilla-dev-s...@lists.mozilla.org
We acknowledge that this mis-issuance is caused by technical and organizational issues which we will improve in as fast as possible. We do realize that the importance of timely revokation of certificates for the WebPKI environment is not fully understood by our customers.

As additional measures to the software improvements, we now monitor crt.sh at least twice a day for certificates with similar errors. We will also launch communication with our customers, explaining the impact of certificate revocation and the risk that this involves for the availability of their services.

We really appreciate the comments on this issue that came forward from the community.

Buschart, Rufus

unread,
Nov 28, 2018, 2:45:42 AM11/28/18
to Enrico Entschew, mozilla-dev-s...@lists.mozilla.org
To simplify the process of monitoring crt.sh, we at Siemens have implemented a little web service which directly queries crt.sh DB and returns the errors as JSON. By this you don't have to parse HTML files and can directly integrate it into your monitoring. Maybe this function is of interest for some other CA:

https://eo0kjkxapi.execute-api.eu-central-1.amazonaws.com/prod/crtsh-monitor?caID=52410&daystolookback=30&excluderevoked=false

To monitor your CA, replace the caID with your CA's ID from crt.sh. In case you receive an endpoint time-out message, try again, crt.sh DB often returns time outs. For more details or function requests, have a look into its GitHub repo: https://github.com/RufusJWB/crt.sh-monitor


With best regards,
Rufus Buschart

Siemens AG
Information Technology
Human Resources
PKI / Trustcenter
GS IT HR 7 4
Hugo-Junkers-Str. 9
90411 Nuernberg, Germany
Tel.: +49 1522 2894134
mailto:rufus.b...@siemens.com
www.twitter.com/siemens

www.siemens.com/ingenuityforlife

Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann Snabe; Managing Board: Joe Kaeser, Chairman, President and Chief Executive Officer; Roland Busch, Lisa Davis, Klaus Helmrich, Janina Kugel, Cedrik Neike, Michael Sen, Ralf P. Thomas; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322

> -----Ursprüngliche Nachricht-----
> Von: dev-security-policy <dev-security-...@lists.mozilla.org> Im Auftrag von Enrico Entschew via dev-security-policy
> Gesendet: Dienstag, 27. November 2018 18:17
> An: mozilla-dev-s...@lists.mozilla.org
> Betreff: Re: Incident report D-TRUST: syntax error in one tls certificate
>
> Am Montag, 26. November 2018 18:34:38 UTC+1 schrieb Jakob Bohm:
>
> > In addition to this, would you add the following:
> >
> > - Daily checks of crt.sh (or some other existing tool) if additional
> > such certificates are erroneously issued before the automated
> > countermeasures are in place?
>
> Thank you, Jakob. This is what we intended to do. We are monitoring crt.sh at least twice daily every day from now on.
>
> As to your other point, we do restrict the serial number element and the error occurred precisely in defining the constraints for this
> field. As mentioned above, we plan to make adjustments to our systems to prevent this kind of error in future.
> _______________________________________________
> dev-security-policy mailing list
> dev-secur...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy

Pedro Fuentes

unread,
Nov 28, 2018, 7:07:37 AM11/28/18
to mozilla-dev-s...@lists.mozilla.org
Hi Rufus,
I got internal server error on that link, but I really appreciate your post and the link to code!
Pedro

Dimitris Zacharopoulos

unread,
Nov 28, 2018, 7:49:05 AM11/28/18
to Pedro Fuentes, mozilla-dev-s...@lists.mozilla.org

As pointed out by one of my engineers, there is a simpler way by doing a
simple direct query [1] in the read-only database of crt.sh. Using
Rufus' example:

SELECT get_ca_name_attribute(issuer_ca_id, 'organizationName') issuer_o, ISSUER_CA_ID, FATAL_CERTS, ERROR_CERTS, WARNING_CERTS FROM lint_1week_summary WHERE LINTER = 'cablint' AND ISSUER_CA_ID=52410;

Anyone can automate this process with tools they are more familiar with.


Dimitris.

[1] https://groups.google.com/forum/#!topic/crtsh/sUmV0mBz8bQ




On 28/11/2018 2:07 μ.μ., Pedro Fuentes via dev-security-policy wrote:
> Hi Rufus,
> I got internal server error on that link, but I really appreciate your post and the link to code!
> Pedro
>
> El miércoles, 28 de noviembre de 2018, 8:45:42 (UTC+1), Buschart, Rufus escribió:

Nick Lamb

unread,
Nov 28, 2018, 3:10:08 PM11/28/18
to dev-secur...@lists.mozilla.org, ry...@sleevi.com
On Mon, 26 Nov 2018 18:47:25 -0500
Ryan Sleevi via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:
> CAs have made the case - it was not accepted.
>
> On a more fundamental and philosophical level, I think this is
> well-intentioned but misguided. Let's consider that the issue is one
> that the CA had the full power-and-ability to prevent - namely, they
> violated the requirements and misissued. A CA is only in this
> situation if they are a bad CA - a good CA will never run the risk of
> "annoying" the customer.

I would sympathise with this position if we were considering, say, a
problem that had caused a CA to issue certs with the exact same mistake
for 18 months, rather than, as I understand here, a single certificate.

Individual human errors are inevitable at a "good CA". We should not
design systems, including policy making, that assume all errors will be
prevented because that contradicts the assumption that human error is
inevitable. Although it is often used specifically to mean operator
error, human error can be introduced anywhere. A requirements document
which erroneously says a particular Unicode codepoint is permitted in a
field when it should be forbidden is still human error. A department
head who feels tired and signs off on a piece of work that actually
didn't pass tests, still human error.

In true failure-is-death scenarios like fly-by-wire aircraft controls
this assumption means extraordinary methods must be used in order to
minimise the risk of inevitable human error resulting in real world
systems failure. Accordingly the resulting systems are exceptionally
expensive. Though the Web PKI is important, we should not imagine for
ourselves that it warrants this degree of care and justifies this level
of expense even at a "good CA".

What we can require in policy - and as I understand it Mozilla policy
does require - is that the management (also humans) take steps to
report known problems and prevent them from recurring. This happened
here.

> This presumes that the customer cannot take steps to avoid this.
> However, as suggested by others, the customer could have minimized or
> eliminated annoyance, such as by ensuring they have a robust system
> to automate the issuance/replacement of certificates. That they
> didn't is an operational failure on their fault.

I agree with this part.

> This presumes that there is "little or no risk to relying parties."
> Unfortunately, they are by design not a stakeholder in those
> conversations

It does presume this, and I've seen no evidence to the contrary. Also I
think I am in fact a stakeholder in this conversation anyway?

> I agree that it's entirely worthless the increasingly implausible
> "important" revocations. I think a real and meaningful solution is
> what is being more consistently pursued, and that's to distrust CAs
> that are not adhering to the set of expectations.

I don't think root distrust is an appropriate response, in the current
state, to a single incident of this nature, this sort of thing is,
indeed, why you may remember me suggesting that Mozilla needs other
mechanisms short of distrust in its arsenal.

Nick.

Jakob Bohm

unread,
Nov 28, 2018, 4:41:50 PM11/28/18
to mozilla-dev-s...@lists.mozilla.org
On 27/11/2018 00:54, Ryan Sleevi wrote:
> On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
> dev-secur...@lists.mozilla.org> wrote:
>
>> 1. Having a spare certificate ready (if done with proper security, e.g.
>> a separate key) from a different CA may unfortunately conflict with
>> badly thought out parts of various certificate "pinning" standards.
>>
>
> You blame the standards, but that seems an operational risk that the site
> (knowingly) took. That doesn't make a compelling argument.
>

I blame those standards for forcing every site to choose between two
unfortunate risks, in this case either the risks prevented by those
"pinning" mechanisms and the risks associated with having only one
certificate.

The fact that sites are forced to make that choice makes it unfair to
presume they should always choose to prevent whichever risk is discussed
in a given context. Groups discussing other risks could equally unfairly
blame sites for not using one of those "pinning" mechanims.

>
>> 2. Being critical from a society perspective (e.g. being the contact
>> point for a service to help protect the planet), doesn't mean that the
>> people running such a service can be expected to be IT superstars
>> capable of dealing with complex IT issues such as unscheduled
>> certificate replacement due to no fault of their own.
>>
>
> That sounds like an operational risk the site (knowingly) took. Solutions
> for automation exist, as do concepts such as "hiring multiple people"
> (having a NOC/SOC). I see nothing to argue that a single person is somehow
> the risk here.
>

The number of people in the world who can do this is substantially
smaller than the number of sites that might need them. We must
therefore, by necessity, accept that some such sites will not hire such
people, or worse multiple such people for their own exclusive use.

Automating certificate deployment (as you often suggest) lowers
operational security, as it necessarily grants read/write access to
the certificate data (including private key) to an automated, online,
unsupervised system.

Allowing multiple persons to replace the certificates also lowers
operational security, as it (by definition) grants multiple persons
read/write access to the certificate data.

Under the current and past CA model, certificate and private key
replacement is a rare (once/2 years) operation that can be done
manually and scheduled weeks in advance, except for unexpected
failures (such as a CA messing up).


>
>> 3. Not every site can be expected to have the 24/7 staff on hand to do
>> "top security credentials required" changes, for example a high-
>> security end site may have a rule that two senior officials need to
>> sign off on any change in cryptographic keys and certificates, while a
>> limited-staff end-site may have to schedule a visit from their outside
>> security consultant to perform the certificate replacement.
>>
>
> This is exactly describing a known risk that the site took, accepting the
> tradeoffs. I fail to see a compelling argument that there should be no
> tradeoffs - given the harm presented to the ecosystem - and if sites want
> to make such policies, rather than promoting automation and CI/CD, then it
> seems that's a risk they should bear and make an informed choice.
>

The trade off would have been made against the risk of the site itself
mishandling its private key (e.g. a site breach). Not against force
majeure situations such as a CA recalling a certificate out of turn.

It is generally not fair to say "that we may impose a difficult
situation is a risk that the site took".

> Thus I would be all for an official BR ballot to clarify/introduce
>> that 24 hour revocation for non-compliance doesn't apply to non-
>> dangerous technical violations.
>>
>
> As discussed elsewhere, there is no such thing as "non-dangerous technical
> violations". It is a construct, much like "clean coal", that has an
> appealing turn of phrase, but without the evidence to support it.
>

That is simply not true. The case at hand is a very good example, as
the problem is that a text field used only for display purposes by
current software, and generally requiring either human interpretation or
yet-to-be-defined parseable definitions, was given an out-of-range
value.

Unless someone can point out a real-world piece of production software
which causes security problems when presented with the particular out-
of-range value, or that the particular out-of-range value would
reasonably mislead human relying parties, than dangers are entirely
hypothetical and/or political.

>
>> Another category that would justify a longer CA response time would be a
>> situation where a large batch of certificates need to be revalidated due
>> to a weakness in validation procedures (such as finding out that a
>> validation method had a vulnerability, but not knowing which if any of
>> the validated identities were actually fake). For example to recheck a
>> typical domain-control method, a CA would have to ask each certificate
>> holder to respond to a fresh challenge (lots of manual work by end
>> sites), then do the actual check (automated).
>
>
> Like the other examples, this is not at all compelling. Solutions exist to
> mitigate this risk entirely. CAs and their Subscribers that choose not to
> avail themselves of these methods - for whatever the reason - are making an
> informed market choice about these. If they're not informed, that's on the
> CAs. If they are making the choice, that's on the Subscribers.
>

You have yet to point out methods that work in practice and without risk
for organizations not dedicated to a large scale devOps model like your
employer.

For example, every BR permitted automated domain validation method
involves a challenge-response interaction with the site owner, who must
not (to prevent rogue issuance) respond to that interaction except
during planned issuance.

Thus any unscheduled revalidation of domain ownership would, by
necessity, involve contacting the site owner and convincing them this is
not a phishing attempt.

Some ACME protocols may contain specific authenticated ways for the CA
to revalidate out-of-schedule, but this would be outside the norm.

> There's zero reason to change, especially when such revalidation can be,
> and is, being done automatically.
>


Wayne Thayer

unread,
Nov 28, 2018, 5:15:13 PM11/28/18
to Nick Lamb, MDSP, Ryan Sleevi
The way that we currently handle these types of issues is about as good as
we're going to get. We have a [recently relaxed but still] fairly stringent
set of rules around revocation in the BRs. This is necessary and proper
because slow/delayed revocation can clearly harm our users. It was
difficult to gain consensus within the CAB Forum on allowing even 5 days in
some circumstances - I'm confident that something like 28 days would be a
non-starter. I'm also confident that CAs will always take the entire time
permitted to perform revocations, regardless of the risk, because it is in
their interest to do so (that is not mean to be a criticism of CAs so much
as a statement that CAs exist to serve their customers, not our users). I'm
also confident that any attempt to define "low risk" misissuance would just
incentivize CAs to stop treating misissuance as a serious offense and we'd
be back to where we were prior to the existence of linters..

CAs obviously do choose to violate the revocation time requirements. I do
not believe this is generally based on a thorough risk analysis, but in
practice it is clear that they do have some discretion. I am not aware of a
case (yet) in which Mozilla has punished a CA solely for violating a
revocation deadline. When that happens, the violation is documented in a
bug and should appear on the CA's next audit report/attestation statement.
>From there, the circumstances (how many certs?, what was the issue?, was it
previously documented?, is this a pattern of behavior?) have to be
considered on a case-by-case basis to decide a course of action. I realize
that this is not a very satisfying answer to the questions that are being
raised, but I do think it's the best answer.

- Wayne

Dimitris Zacharopoulos

unread,
Nov 29, 2018, 2:16:58 AM11/29/18
to Wayne Thayer, Nick Lamb, Ryan Sleevi, MDSP
Mandating that CAs disclose revocation situations that exceed the 5-day
requirement with some risk analysis information, might be a good place
to start. Of course, this should be independent of a "mis-issuance
incident report". By collecting this information, Mozilla would be in a
better position to evaluate the challenges CAs face with revocations
*initiated by the CA* without adequate warning to the Subscriber. I
don't consider 5 days (they are not even working days) to be adequate
warning period to a large organization with slow reflexes and long
procedures. Once Mozilla collects more information, you might be able to
see possible patterns in various CAs and decide what is acceptable and
what is not, and create policy rules accordingly.

For example, if many CAs violate the 5-day rule for revocations related
to improper subject information encoding, out of range, wrong syntax and
that sort, Mozilla or the BRs might decide to have a separate category
with a different time frame and/or different actions.

It is not the first time we talk about this and it might be worth
exploring further.

As a general comment, IMHO when we talk about RP risk when a CA issues a
Certificate with -say- longer than 64 characters in an OU field, that
would only pose risk to Relying Parties *that want to interact with that
particular Subscriber*, not the entire Internet. These RPs *might*
encounter compatibility issues depending on their browser and will
either contact the Subscriber and notify them that their web site
doesn't work or they will do nothing. It's similar to a situation where
a site operator forgets to send the intermediate CA Certificate in the
chain. These particular RPs will fail to get TLS working when they visit
the Subscriber's web site.


Dimitris.

Ryan Sleevi

unread,
Nov 29, 2018, 11:39:49 AM11/29/18
to Dimitris Zacharopoulos, Wayne Thayer, Nick Lamb, Ryan Sleevi, MDSP
On Thu, Nov 29, 2018 at 2:16 AM Dimitris Zacharopoulos <ji...@it.auth.gr>
wrote:

> Mandating that CAs disclose revocation situations that exceed the 5-day
> requirement with some risk analysis information, might be a good place
> to start.


This was proposed several times by Google in the Forum, and consistently
rejected, unfortunately.


> I don't consider 5 days (they are not even working days) to be adequate
> warning period to a large organization with slow reflexes and long
> procedures.


Phrased differently: You don't think large organizations are currently
capable, and believe the rest of the industry should accommodate that.

Do you believe these organizations could respond within 5 days if their
internet connectivity was lost?


> For example, if many CAs violate the 5-day rule for revocations related
> to improper subject information encoding, out of range, wrong syntax and
> that sort, Mozilla or the BRs might decide to have a separate category
> with a different time frame and/or different actions.
>

Given the security risks in this, I think this is extremely harmful to the
ecosystem and to users.

It is not the first time we talk about this and it might be worth
> exploring further.
>

I don't think any of the facts have changed. We've discussed for several
years that CAs have the opportunity to provide this information, and
haven't, so I don't think it's at all proper to suggest starting a
conversation without structured data. CAs that are passionate about this
could have supported such efforts in the Forum to provide this information,
or could have demonstrated doing so on their own. I don't think it would at
all be productive to discuss these situations in abstract hypotheticals, as
some of the discussions here try to do - without data, that would be an
extremely unproductive use of time.


> As a general comment, IMHO when we talk about RP risk when a CA issues a
> Certificate with -say- longer than 64 characters in an OU field, that
> would only pose risk to Relying Parties *that want to interact with that
> particular Subscriber*, not the entire Internet.


No. This is demonstrably and factually wrong.

First, we already know that technical errors are a strong sign that the
policies and practices themselves are not being followed - both the
validation activities and the issuance activities result from the CA
following it's practices and procedures. If a CA is not following its
practices and procedures, that's a security risk to the Internet, full stop.

Second, it presumes (incorrectly) that interoperability is not something
valuable. That is, if say the three existing, most popular implementations
all do not check whether or not it's longer than 64 characters (for
example), and a fourth implementation would like to come along, they cannot
read the relevant standards and implement something interoperable. This is
because 'interoperability' is being redefined as 'ignoring' the standard -
which defeats the purposes of standards to begin with. These choices - to
permit deviations - creates risks for the entire ecosystem, because there's
no longer interoperability. This is equally captured in
https://tools.ietf.org/html/draft-iab-protocol-maintenance-01

The premise to all of this is that "CAs shouldn't have to follow rules,
browsers should just enforce them," which is shocking and unfortunate. It's
like saying "It's OK to lie about whatever you want, as long as you don't
get caught" - no, that line of thinking is just as problematic for morality
as it is for technical interoperability. CAs that routinely violate the
standards create risk, because they have full trust on the Internet. If the
argument is that the CA's actions (of accidentally or deliberately
introducing risk) is the problem, but that we shouldn't worry about
correcting the individual certificate, that entirely misses the point that
without correcting the certificate, there's zero incentive to actually
follow the standards, and as a result, that creates risk for everyone.
Revocation, if you will, is the "less worse" alternative to complete
distrust - it only affects that single certificate, rather than every one
of the certificates the CA has issued. The alternative - not revoking -
simply says that it's better to look at distrust options, and that's more
risk for everyone.

Finally, CAs are terrible at assessing the risk to RPs. For example,
negative serial numbers were prolific prior to the linters, and those have
issues in as much as they are, for some systems, irrevocable. This is
because those systems implemented the standards correctly - serials are
positive INTEGERs - yet had to account for the fact that CAs are improperly
encoding them, such as by "making" them positive (adding the leading zero).
This leading zero then doesn't get stripped off when looking up by Issuer &
Serial Number, because they're using the "spec-correct" serial rather than
the "issuer-broken" serial. That's an example where the certificate
"works", no report is filed, but the security and ecosystem properties are
fatally compromised. The alternatives for such implementation are:
1) Reject such certificates (but see above about market forces and
interoperability)
2) Correct both the certificate and the CRL/OCSP serial number (which then
creates risk because you're not actually checking _any_ certificates true
serial)
3) Allow negative serial numbers (which then makes it harder for others to
do #1)

As I said, CAs have been terrible at assessing risk to the ecosystem for
their decisions. The page at
https://wiki.mozilla.org/SecurityEngineering/mozpkix-testing#Things_for_CAs_to_Fix
shows how bad such interoperability harms improvements - for example, all
of these hacks that Mozilla had to add in order to ship a more secure, more
efficient certificate verifier.

Nick Lamb

unread,
Nov 30, 2018, 4:48:16 AM11/30/18
to dev-secur...@lists.mozilla.org, Jakob Bohm
On Wed, 28 Nov 2018 22:41:37 +0100
Jakob Bohm via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> I blame those standards for forcing every site to choose between two
> unfortunate risks, in this case either the risks prevented by those
> "pinning" mechanisms and the risks associated with having only one
> certificate.

HTTPS Key Pinning (HPKP) is deprecated by Google and is widely
considered a failure because it acts as a foot-gun and (more seriously
but less likely in practice) enables sites to be held to ransom by bad
guys.

Mostly though, what I want to focus on is a big hole in your knowledge
of what's available today, which I'd argue is likely significant in
that probably most certificate Subscribers don't know about it, and
that's something the certificate vendors could help to educate them
about and/or deliver products to help them use.

> Automating certificate deployment (as you often suggest) lowers
> operational security, as it necessarily grants read/write access to
> the certificate data (including private key) to an automated, online,
> unsupervised system.

No!

This system does not need access to private keys. Let us take ACME as
our example throughout, though nothing about what I'm describing needs
ACME per se, it's simply a properly documented protocol for automation
that complies with CA/B rules.

The ACME CA expects a CSR, signed with the associated private key, but
it does not require that this CSR be created fresh during validation +
issuance. A Subscriber can as they wish generate the CSR manually,
offline and with full supervision. The CSR is a public document
(revealing it does not violate any cryptographic assumptions). It is
entirely reasonable to create one CSR when the key pair is minted and
replace it only in a scheduled, predictable fashion along with the keys
unless a grave security problem occurs with your systems.

ACME involves a different private key, possessed by the subscriber/
their agent only for interacting securely with ACME, the ACME client
needs this key when renewing, but it doesn't put the TLS certificate key
at risk.

Certificates are public information by definition. No new risk there.


> Allowing multiple persons to replace the certificates also lowers
> operational security, as it (by definition) grants multiple persons
> read/write access to the certificate data.

Again, certificates themselves are public information and this does not
require access to the private keys.

> Under the current and past CA model, certificate and private key
> replacement is a rare (once/2 years) operation that can be done
> manually and scheduled weeks in advance, except for unexpected
> failures (such as a CA messing up).

This approach, which has been used at some of my past employers,
inevitably results in systems where the certificates expire "by
mistake". Recriminations and insistence that lessons will be learned
follow, and then of course nothing is followed up and the problem
recurs.

It's a bad idea, a popular one, but still a bad idea.

> For example, every BR permitted automated domain validation method
> involves a challenge-response interaction with the site owner, who
> must not (to prevent rogue issuance) respond to that interaction
> except during planned issuance.

It is entirely possible and theoretically safe to configure ACME
responders entirely passively. You can see this design in several
popular third party ACME clients.

The reason it's theoretically safe is that ACME's design ensures the
validation server (for example Let's Encrypt's Boulder) unavoidably
verifies that the validation response is from the correct ACME account
holder.

So if bad guys request issuance, the auto-responder will present a
validation response for the good guy account, which does not match and
issuance will not occur. The bad guys will be told their validation
failed and they've got the keys wrong. Which of course they can't fix
since they've no idea what the right ACME account private key is.

For http-01 at least, you can even configure this without the
auto-responder having any private knowledge at all. Since this part is
just playing back a signature, our basic cryptographic assumptions mean
that we can generate the signature offline and then paste it into the
auto-responder. At least one popular ACME client offers this behaviour.

For a huge outfit like Google or Facebook that can doubtless afford to
have an actual "certificate team" this would not be an appropriate
measure, but at a smaller business it seems entirely reasonable.


> Thus any unscheduled revalidation of domain ownership would, by
> necessity, involve contacting the site owner and convincing them this
> is not a phishing attempt.

See above, this works today for lots of ACME validated domains.

> Some ACME protocols may contain specific authenticated ways for the
> CA to revalidate out-of-schedule, but this would be outside the norm.

Just revalidating, though it seems to be a popular trick for CAs, is
not what I had in mind since it wouldn't help here. What needs to
happen is a fresh issuance, and ACME can make that pretty trivial as I
described.

Nick.

Eric Mill

unread,
Dec 1, 2018, 4:22:13 PM12/1/18
to Jakob Bohm, mozilla-dev-s...@lists.mozilla.org
On Wed, Nov 28, 2018 at 4:41 PM Jakob Bohm via dev-security-policy <
dev-secur...@lists.mozilla.org> wrote:

> On 27/11/2018 00:54, Ryan Sleevi wrote:
> > On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
> > dev-secur...@lists.mozilla.org> wrote:
> >
> >> 2. Being critical from a society perspective (e.g. being the contact
> >> point for a service to help protect the planet), doesn't mean that
> the
> >> people running such a service can be expected to be IT superstars
> >> capable of dealing with complex IT issues such as unscheduled
> >> certificate replacement due to no fault of their own.
> >>
> >
> > That sounds like an operational risk the site (knowingly) took. Solutions
> > for automation exist, as do concepts such as "hiring multiple people"
> > (having a NOC/SOC). I see nothing to argue that a single person is
> somehow
> > the risk here.
> >
>
> The number of people in the world who can do this is substantially
> smaller than the number of sites that might need them. We must
> therefore, by necessity, accept that some such sites will not hire such
> people, or worse multiple such people for their own exclusive use.
>
> Automating certificate deployment (as you often suggest) lowers
> operational security, as it necessarily grants read/write access to
> the certificate data (including private key) to an automated, online,
> unsupervised system.
>

Respectfully, this isn't accurate. Automated certificate deployment and
rotation is a best practice for high-functioning enterprises, and can be
done without exposing general read/write access to other systems. I've seen
automated certificate rotation implemented in several federal government
agencies, and (maybe more importantly) have seen many more agencies let
their certificates expire and impact the security of public services due to
a lack of automation.

Nick already described how the ACME protocol can be automated without
exposing the TLS private key, but more generally, organizations can use
scoped permissioning to grant individual components only the specific
access they need to accomplish their job. As an example, customers of
Amazon Web Services can use the IAM permissions framework to establish
granular permissions that mitigate the impact of component compromise.
Enterprises relying on self-managed infrastructure are free to implement a
similar system.

For a government example of automated certificate issuance, see
https://cloud.gov/docs/services/cdn-route/, which is a FedRAMPed service
whose security authorization is signed off on by the Departments of Defense
and Homeland Security.

Societally important organizations who don't specialize in technology
(which is most of them), or for whatever reason can't feasibly automate
their certificate operations, should definitely be relying on
infrastructure managed by third parties which do specialize in this
technology, be it basic site hosting like Squarespace or more sophisticated
cloud services.

In other words, no organization has an excuse to not be able to rotate a
certificate given 5 days' notice. The fact that many large organizations
continue to have a problem with this doesn't make it any more excusable.

-- Eric


> Allowing multiple persons to replace the certificates also lowers
> operational security, as it (by definition) grants multiple persons
> read/write access to the certificate data.
>

Jakob Bohm

unread,
Dec 3, 2018, 7:39:17 PM12/3/18
to mozilla-dev-s...@lists.mozilla.org
A few clarifications below

On 30/11/2018 10:48, Nick Lamb wrote:
> On Wed, 28 Nov 2018 22:41:37 +0100
> Jakob Bohm via dev-security-policy
> <dev-secur...@lists.mozilla.org> wrote:
>
>> I blame those standards for forcing every site to choose between two
>> unfortunate risks, in this case either the risks prevented by those
>> "pinning" mechanisms and the risks associated with having only one
>> certificate.
>
> HTTPS Key Pinning (HPKP) is deprecated by Google and is widely
> considered a failure because it acts as a foot-gun and (more seriously
> but less likely in practice) enables sites to be held to ransom by bad
> guys.
>
> Mostly though, what I want to focus on is a big hole in your knowledge
> of what's available today, which I'd argue is likely significant in
> that probably most certificate Subscribers don't know about it, and
> that's something the certificate vendors could help to educate them
> about and/or deliver products to help them use.
>

Interesting. What is that hole?

>> Automating certificate deployment (as you often suggest) lowers
>> operational security, as it necessarily grants read/write access to
>> the certificate data (including private key) to an automated, online,
>> unsupervised system.
>
> No!
>
> This system does not need access to private keys. Let us take ACME as
> our example throughout, though nothing about what I'm describing needs
> ACME per se, it's simply a properly documented protocol for automation
> that complies with CA/B rules.

It certainly needs the ability to change private keys (as reusing private
keys for new certificates is bad practice and shouldn't be automated).

This means that some part of the overall automated system needs the ability
to generate fresh keys, sign CSRs, and cause servers to switch to those new
keys.

And because this discussion entails triggering all that at an out-of-schedule
time, having a "CSR pre-generation ceremony" every 24 months (the normal
reissue schedule for EV certs) will provide limited ability to handle
out-of-schedule certificate replacement (because it is also bad practice to
have private keys with a design lifetime of 24 months laying around for 48
months prior to planned expiry).


>
> The ACME CA expects a CSR, signed with the associated private key, but
> it does not require that this CSR be created fresh during validation +
> issuance. A Subscriber can as they wish generate the CSR manually,
> offline and with full supervision. The CSR is a public document
> (revealing it does not violate any cryptographic assumptions). It is
> entirely reasonable to create one CSR when the key pair is minted and
> replace it only in a scheduled, predictable fashion along with the keys
> unless a grave security problem occurs with your systems.
>
> ACME involves a different private key, possessed by the subscriber/
> their agent only for interacting securely with ACME, the ACME client
> needs this key when renewing, but it doesn't put the TLS certificate key
> at risk.
>
> Certificates are public information by definition. No new risk there.
>

By definition, the strength of public keys, especially TLS RSA signing
keys used with PFS suites, involves a security tradeoff between the
time that attackers have to break/factor the public key and the slowness
of handling TLS connections with current generation standard hardware and
software.

The current WebPKI/BR tradeoff/compromise is set at 2048 bit keys valid
for about 24 months.



>
>> Allowing multiple persons to replace the certificates also lowers
>> operational security, as it (by definition) grants multiple persons
>> read/write access to the certificate data.
>
> Again, certificates themselves are public information and this does not
> require access to the private keys.

It requires write access to the private keys, even if the operators might
not need to see those keys, many real world systems don't allow granting
"install new private key" permission without "see new private key"
permission and "choose arbitrary private key" permission.

Also, many real world systems don't allow installing a new certificate
for an existing key without reinstalling the matching private key, simply
because that's the interface.

Traditional military encryption systems are built without these
limitations, but civilian systems are often not.


>
>> Under the current and past CA model, certificate and private key
>> replacement is a rare (once/2 years) operation that can be done
>> manually and scheduled weeks in advance, except for unexpected
>> failures (such as a CA messing up).
>
> This approach, which has been used at some of my past employers,
> inevitably results in systems where the certificates expire "by
> mistake". Recriminations and insistence that lessons will be learned
> follow, and then of course nothing is followed up and the problem
> recurs.
>
> It's a bad idea, a popular one, but still a bad idea.

This is why good CAs send out reminder e-mails in advance. And why
one should avoid CAs that use that contact point for infinite spam
about new services.

>
>> For example, every BR permitted automated domain validation method
>> involves a challenge-response interaction with the site owner, who
>> must not (to prevent rogue issuance) respond to that interaction
>> except during planned issuance.
>
> It is entirely possible and theoretically safe to configure ACME
> responders entirely passively. You can see this design in several
> popular third party ACME clients.
>
> The reason it's theoretically safe is that ACME's design ensures the
> validation server (for example Let's Encrypt's Boulder) unavoidably
> verifies that the validation response is from the correct ACME account
> holder.
>
> So if bad guys request issuance, the auto-responder will present a
> validation response for the good guy account, which does not match and
> issuance will not occur. The bad guys will be told their validation
> failed and they've got the keys wrong. Which of course they can't fix
> since they've no idea what the right ACME account private key is.

The scenario is "Bad guy requests new cert, CA properly challenges
good guy at good guy address, good guy responds positively without
reference to old good guy CSR, CA issues for bad guy CSR, bad guy
grabs new cert from anywhere and matches to bad guy private key,
bad guy does actual attack".

>
> For http-01 at least, you can even configure this without the
> auto-responder having any private knowledge at all. Since this part is
> just playing back a signature, our basic cryptographic assumptions mean
> that we can generate the signature offline and then paste it into the
> auto-responder. At least one popular ACME client offers this behaviour.

This would only work if the existing CA challenge can be re-used or there
is no CA chosen challenge in the protocol. I have yet to look at http-01
as it doesn't fit my own usage scenarios.

>
> For a huge outfit like Google or Facebook that can doubtless afford to
> have an actual "certificate team" this would not be an appropriate
> measure, but at a smaller business it seems entirely reasonable.
>
>
>> Thus any unscheduled revalidation of domain ownership would, by
>> necessity, involve contacting the site owner and convincing them this
>> is not a phishing attempt.
>
> See above, this works today for lots of ACME validated domains.
>
>> Some ACME protocols may contain specific authenticated ways for the
>> CA to revalidate out-of-schedule, but this would be outside the norm.
>
> Just revalidating, though it seems to be a popular trick for CAs, is
> not what I had in mind since it wouldn't help here. What needs to
> happen is a fresh issuance, and ACME can make that pretty trivial as I
> described.
>

I was arguing that the general argument was invalid. Obviously
revalidation would not be useful for a malformed certificate like
the one from D-Trust.

Nick Lamb

unread,
Dec 3, 2018, 11:38:28 PM12/3/18
to dev-secur...@lists.mozilla.org, Jakob Bohm
On Tue, 4 Dec 2018 01:39:05 +0100
Jakob Bohm via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> A few clarifications below
> Interesting. What is that hole?

I had assumed that you weren't aware that you could just use these
systems as designed. Your follow-up clarifies that you believe doing
this is unsafe. I will endeavour to explain why you're mistaken.

But also I specifically endorse _learning by doing_. Experiment for
yourself with how easy it is to achieve auto-renewal with something like
ACME, try to request renewals against a site that's configured for
"stateless renewal" but with a new ("bad guy") key instead of your real
ACME account keys.


> It certainly needs the ability to change private keys (as reusing
> private keys for new certificates is bad practice and shouldn't be
> automated).

In which good practice document can I read that private keys should be
replaced earlier than their ordinary lifetime if new certificates are
minted during that lifetime? Does this document explain how its authors
imagine the new certificate introduces a novel risk?

[ This seems like breakthrough work to me, it implies a previously
unimagined weakness in, at least, RSA ]

You must understand that bad guys can, if they wish, construct an
unlimited number of new certificates corresponding to an existing key,
silently. Does this too introduce an unacceptable risk ? If not, why is
the risk introduced if a trusted third party mints one or more further
certificates ?

No, I think the problem here is with your imaginary "bad practice".
You have muddled the lifetime of the certificate (which relates to the
decay in assurance of subject information validated and to other
considerations) with the lifetime of the keys, see below.

> By definition, the strength of public keys, especially TLS RSA
> signing keys used with PFS suites, involves a security tradeoff
> between the time that attackers have to break/factor the public key
> and the slowness of handling TLS connections with current generation
> standard hardware and software.

This is true.

> The current WebPKI/BR tradeoff/compromise is set at 2048 bit keys
> valid for about 24 months.

Nope. The limit of 825 days (not "about 24 months") is for leaf
certificate lifetime, not for keys. It's shorter than it once was not
out of concern about bad guys breaking 2048-bit RSA but because of
concern about algorithmic agility and the lifetime of subject
information validation, mostly the former.

Subscribers are _very_ strongly urged to choose shorter, not longer
lifetimes, again not because we're worried about 2048-bit RSA (you will
notice there's no exemption for 4096-bit keys) but because of agility
and validation.

But choosing new keys every time you get a new certificate is
purely a mechanical convenience of scheduling, not a technical necessity
- like a fellow who schedules an appointment at the barber each time he
receives a telephone bill, the one thing has nothing to do with the
other.


> It requires write access to the private keys, even if the operators
> might not need to see those keys, many real world systems don't allow
> granting "install new private key" permission without "see new
> private key" permission and "choose arbitrary private key" permission.
>
> Also, many real world systems don't allow installing a new
> certificate for an existing key without reinstalling the matching
> private key, simply because that's the interface.
>
> Traditional military encryption systems are built without these
> limitations, but civilian systems are often not.

Nevertheless.

I'm sure there's a system out there somewhere which requires you to
provide certificates on a 3.5" floppy disk. But that doesn't mean
issuing certificates can reasonably be said to require a 3.5" floppy
disk, it's just those particular systems.

> This is why good CAs send out reminder e-mails in advance. And why
> one should avoid CAs that use that contact point for infinite spam
> about new services.

They do say that insanity consists of doing the same thing over and
over and expecting different results.

> The scenario is "Bad guy requests new cert, CA properly challenges
> good guy at good guy address, good guy responds positively without
> reference to old good guy CSR, CA issues for bad guy CSR, bad guy
> grabs new cert from anywhere and matches to bad guy private key,
> bad guy does actual attack".

You wrote this in response to me explaining exactly why this scenario
won't work in ACME (or any system which wasn't designed by idiots -
though having read their patent filings the commercial CAs on the whole
may be taken as idiots to my understanding)

I did make one error though, in using the word "signature" when this
data is not a cryptographic signature, but rather a "JWK Thumbprint".

When "good guy responds positively" that positive response includes
a Thumbprint corresponding to their ACME public key. When they're
requesting issuance this works fine because they use their ACME keys
for their requests. But when a bad guy requests issuance it won't work
because the bad guy does not know that ACME private account key. They
can choose their own of course, but that won't match the Thumbprint on
the "positive response" and so the request fails.

> This would only work if the existing CA challenge can be re-used or
> there is no CA chosen challenge in the protocol. I have yet to look
> at http-01 as it doesn't fit my own usage scenarios.

Computers are programmable :D

Specifically in this case you:

1. Calculate your Thumbprint (off-line if appropriate) and make a note
of it.

2. Write a "program" (actually one configuration line for popular web
servers) to respond to all ACME challenges by playing back the random
challenge text with the Thumbprint appended as described in the ACME
protocol.


Nick.

Jakob Bohm

unread,
Dec 4, 2018, 1:56:31 AM12/4/18
to mozilla-dev-s...@lists.mozilla.org
On 04/12/2018 05:38, Nick Lamb wrote:
> On Tue, 4 Dec 2018 01:39:05 +0100
> Jakob Bohm via dev-security-policy
> <dev-secur...@lists.mozilla.org> wrote:
>
>> A few clarifications below
>> Interesting. What is that hole?
>
> I had assumed that you weren't aware that you could just use these
> systems as designed. Your follow-up clarifies that you believe doing
> this is unsafe. I will endeavour to explain why you're mistaken.
>

Which systems?

> But also I specifically endorse _learning by doing_. Experiment for
> yourself with how easy it is to achieve auto-renewal with something like
> ACME, try to request renewals against a site that's configured for
> "stateless renewal" but with a new ("bad guy") key instead of your real
> ACME account keys.
>

I prefer not to experiment with live certificates. Anyway, this was
never intended to focus on the specifics of ACME, since OC issuance
isn't ACME anyway.

So returning to the typical, as-specified-in-the-BRs validation
challenges. Those generally either do not include the CSR in the
challenge, or do so in a manner that would involve active checking
rather than just trivial concatenation. These are the kind of
challenges that require the site owner to consider IF they are in a
certificate request process before responding.

>
>> It certainly needs the ability to change private keys (as reusing
>> private keys for new certificates is bad practice and shouldn't be
>> automated).
>
> In which good practice document can I read that private keys should be
> replaced earlier than their ordinary lifetime if new certificates are
> minted during that lifetime? Does this document explain how its authors
> imagine the new certificate introduces a novel risk?
>
> [ This seems like breakthrough work to me, it implies a previously
> unimagined weakness in, at least, RSA ]
>

Aligning key and certificate lifetime is generally good practice.

See for example NIST SP 1800-16B Prelim Draft 1, Section 5.1.4 which has
this to say:

"... It is possible to renew a certificate with the same public and
private keys (i.e., not rekeying during the renewal process).
However, this is only recommended when the private key is contained
with a hardware security module (HSM) validated to Federal Information
Processing Standards (FIPS) Publication 140-2 Level 2 or above"

And the operations I discuss are unlikely to purchase an expensive HSM
that isn't even future proof. (I have checked leading brands of end site
HSMs, and they barely go beyond current recommended key strengths).

> You must understand that bad guys can, if they wish, construct an
> unlimited number of new certificates corresponding to an existing key,
> silently. Does this too introduce an unacceptable risk ? If not, why is
> the risk introduced if a trusted third party mints one or more further
> certificates ?
>
> No, I think the problem here is with your imaginary "bad practice".
> You have muddled the lifetime of the certificate (which relates to the
> decay in assurance of subject information validated and to other
> considerations) with the lifetime of the keys, see below.
>
>> By definition, the strength of public keys, especially TLS RSA
>> signing keys used with PFS suites, involves a security tradeoff
>> between the time that attackers have to break/factor the public key
>> and the slowness of handling TLS connections with current generation
>> standard hardware and software.
>
> This is true.
>
>> The current WebPKI/BR tradeoff/compromise is set at 2048 bit keys
>> valid for about 24 months.
>
> Nope. The limit of 825 days (not "about 24 months") is for leaf
> certificate lifetime, not for keys. It's shorter than it once was not
> out of concern about bad guys breaking 2048-bit RSA but because of
> concern about algorithmic agility and the lifetime of subject
> information validation, mostly the former.

825 Days = 24 months plus ~94 days slop, in practice CAs map this two
payment for 2 years validity and some allowance for overlap during
changeover.

>
> Subscribers are _very_ strongly urged to choose shorter, not longer
> lifetimes, again not because we're worried about 2048-bit RSA (you will
> notice there's no exemption for 4096-bit keys) but because of agility
> and validation.
>
> But choosing new keys every time you get a new certificate is
> purely a mechanical convenience of scheduling, not a technical necessity
> - like a fellow who schedules an appointment at the barber each time he
> receives a telephone bill, the one thing has nothing to do with the
> other.
>

See above NIST quote.

>
>> It requires write access to the private keys, even if the operators
>> might not need to see those keys, many real world systems don't allow
>> granting "install new private key" permission without "see new
>> private key" permission and "choose arbitrary private key" permission.
>>
>> Also, many real world systems don't allow installing a new
>> certificate for an existing key without reinstalling the matching
>> private key, simply because that's the interface.
>>
>> Traditional military encryption systems are built without these
>> limitations, but civilian systems are often not.
>
> Nevertheless.
>
> I'm sure there's a system out there somewhere which requires you to
> provide certificates on a 3.5" floppy disk. But that doesn't mean
> issuing certificates can reasonably be said to require a 3.5" floppy
> disk, it's just those particular systems.

I am referring to the very real facts that:

- Many "config GUI only" systems request certificate import as PKCS#12
files or similar.

- Many open source TLS servers require supplying the private key as an
unencrypted PKCS#8 PEM key file, either appended to the PEM file with
the certificate chain or as a matching file with same file name.

Either of those require private key access to change the certificate
(for the parallel key file case, only if following the recommendation
to rekey on each renewal).


>
>> This is why good CAs send out reminder e-mails in advance. And why
>> one should avoid CAs that use that contact point for infinite spam
>> about new services.
>
> They do say that insanity consists of doing the same thing over and
> over and expecting different results.
>

The risk of forgetting to do the renewal (a self-inflicted risk that
occurs only at the time when this should have been in your calendar) is
substantially different than the risk of someone from the outside
suddenly demanding doing the procedure as a rush job at a completely
different time.

>> The scenario is "Bad guy requests new cert, CA properly challenges
>> good guy at good guy address, good guy responds positively without
>> reference to old good guy CSR, CA issues for bad guy CSR, bad guy
>> grabs new cert from anywhere and matches to bad guy private key,
>> bad guy does actual attack".
>
> You wrote this in response to me explaining exactly why this scenario
> won't work in ACME (or any system which wasn't designed by idiots -
> though having read their patent filings the commercial CAs on the whole
> may be taken as idiots to my understanding)
>
> I did make one error though, in using the word "signature" when this
> data is not a cryptographic signature, but rather a "JWK Thumbprint".
>
> When "good guy responds positively" that positive response includes
> a Thumbprint corresponding to their ACME public key. When they're
> requesting issuance this works fine because they use their ACME keys
> for their requests. But when a bad guy requests issuance it won't work
> because the bad guy does not know that ACME private account key. They
> can choose their own of course, but that won't match the Thumbprint on
> the "positive response" and so the request fails.
>

I also explained, that ACME wasn't the target, and any mitigations specific
to ACME are of little relevance.

>> This would only work if the existing CA challenge can be re-used or
>> there is no CA chosen challenge in the protocol. I have yet to look
>> at http-01 as it doesn't fit my own usage scenarios.
>
> Computers are programmable :D

And programs can have security bugs, which is a key part of the risk
discussed.

>
> Specifically in this case you:
>
> 1. Calculate your Thumbprint (off-line if appropriate) and make a note
> of it.
>
> 2. Write a "program" (actually one configuration line for popular web
> servers) to respond to all ACME challenges by playing back the random
> challenge text with the Thumbprint appended as described in the ACME
> protocol.
>

Again, this is only for your chosen ACME example, while I was referring
to traditional challenges closely matching what the BRs say should be
done.

Nick Lamb

unread,
Dec 4, 2018, 7:36:55 AM12/4/18
to dev-secur...@lists.mozilla.org, Jakob Bohm
On Tue, 4 Dec 2018 07:56:12 +0100
Jakob Bohm via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> Which systems?

As far as I'm aware, any of the automated certificate issuance
technologies can be used here, ACME is the one I'm most familiar with
because it is going through IETF standardisation and so we get to see
not only the finished system but all the process and discussion.

> I prefer not to experiment with live certificates. Anyway, this was
> never intended to focus on the specifics of ACME, since OC issuance
> isn't ACME anyway.

The direction of the thread was: Excuses for why a subscriber can't
manage to replace certificates in a timely fashion. Your contribution
was a claim that automated deployment has poor operational security
because:

"it necessarily grants read/write access to the certificate data
(including private key) to an automated, online, unsupervised system."

I've cleanly refuted that, showing that in a real, widely used system
neither read nor write access to the private key is needed to perform
automated certificate deployment. You do not need to like this, but to
insist that something false is "necessarily" true is ludicrous.

> So returning to the typical, as-specified-in-the-BRs validation
> challenges. Those generally either do not include the CSR in the
> challenge, or do so in a manner that would involve active checking
> rather than just trivial concatenation. These are the kind of
> challenges that require the site owner to consider IF they are in a
> certificate request process before responding.

I _think_ this means you still didn't grasp how ACME works, or even how
one would in general approach this problem. The CSR needs to go from
the would-be subscriber to the CA, it binds the SANs to the key pair,
proving that someone who knows the private key wanted a certificate for
these names. ACME wants to bind the names back to the would-be
subscriber, proving that whoever this is controls those names, and so
is entitled to such a certificate. It uses _different_ keys for that
precisely so that it doesn't need the TLS private key.

But most centrally the Baseline Requirements aren't called the "Ideal
Goals" but only the "Baseline Requirements" for a reason. If a CA
approaches them as a target to be aimed for, rather than as a bare
minimum to be exceeded, we're going to have a problem. Accordingly the
Ten Blessed Methods aren't suggestions for how an ideal CA should
validate control of names, they're the very minimum you must do to
validate control of names. ACME does more, frankly any CA should be
aiming to do more.

> See for example NIST SP 1800-16B Prelim Draft 1, Section 5.1.4 which
> has this to say:
>
> "... It is possible to renew a certificate with the same public and
> private keys (i.e., not rekeying during the renewal process).
> However, this is only recommended when the private key is contained
> with a hardware security module (HSM) validated to Federal
> Information Processing Standards (FIPS) Publication 140-2 Level 2 or
> above"

Just before that sentence the current draft says:

"It is important to note that the validity period of a certificate is
different than the cryptoperiod of the public key contained in the
certificate and the corresponding private key."

Quite so. Thus, the only reason to change both at the same time is as I
said, a convenience of scheduling, NIST does not claim that creating
certificates has any actual impact on the cryptoperiod, they just want
organisations to change their keys frequently and "on renewal" is a
convenient time to schedule such a change.

Moreover, this is (a draft of) Volume B of NIST's guidance. There is an
entire volume, Volume C, about the use of automation, to be published
later. I have no idea what that will say, but I doubt it will begin by
insisting that you need read-write access to private keys to do
something people are already doing today without such access.


> I am referring to the very real facts that:
>
> - Many "config GUI only" systems request certificate import as
> PKCS#12 files or similar.

This is a real phenomenon, and encourages a lot of bad practices we've
discussed previously on m.d.s.policy. It even manages to make the
already confusing (for lay persons) question of what's "secret" and what
is not yet more puzzling, with IMNSHO minimal gains to show for it. Use
of PKCS#12 in this way can't be deprecated quickly enough for my liking.

[ This is also related to the Windows ecosystem in which there's a
pretence kept up that private keys aren't accessible once imported,
which of course isn't mechanically true since those keys are needed by
the system for it to work. So bad guys can ignore the documentation
saying its impossible and just read the keys out of RAM with a trivial
program, but good guys can't get back their own private keys.
A true masterpiece of security engineering, presumably from the same
people who invented the LANMAN password hash. ]

> - Many open source TLS servers require supplying the private key as
> an unencrypted PKCS#8 PEM key file, either appended to the PEM file
> with the certificate chain or as a matching file with same file name.

> Either of those require private key access to change the certificate
> (for the parallel key file case, only if following the recommendation
> to rekey on each renewal).

This just imports the pretence that NIST's recommendation aimed at
ensuring keys get changed _at all_ implies all new certificates must
have new keys from above and isn't a separate idea.

> The risk of forgetting to do the renewal (a self-inflicted risk that
> occurs only at the time when this should have been in your calendar)
> is substantially different than the risk of someone from the outside
> suddenly demanding doing the procedure as a rush job at a completely
> different time.

Today many large organisations struggle to do the first of these
things, adoption of automation would enable them to easily do both. In
practice what's happening is that they adopt "Cloud" technologies that
throw in the certificate automation for free.

[ e.g. Amazon will charge you cash money for a service that mints
certificates for internal use and manages their keys. But if you want
TLS certificates for the Web PKI those are free when Amazon hosts your
web site and will just happen automatically ]


> I also explained, that ACME wasn't the target, and any mitigations
> specific to ACME are of little relevance.

Ensuring that the entity making the issuance request is also behind the
proof-of-control responses is a common sense feature, it just happens
to exceed what is mandated by the Baseline Requirements. This behaviour
would have mitigated a considerable number of goofs we've seen in the
last few years where CAs weren't actually achieving what they thought
they had in their validation methods.

It also isn't central to my argument since the key protected here is
not the TLS private key.

> Again, this is only for your chosen ACME example, while I was
> referring to traditional challenges closely matching what the BRs say
> should be done.

I've read the ACME documentation. I see there is at least some
documentation for Sectigo/ Comodo's solution in this area but it
doesn't appear to cover the actual protocol itself. Who else has a
publicly documented automation protocol ?


Nick.

Jakob Bohm

unread,
Dec 4, 2018, 8:55:58 AM12/4/18
to mozilla-dev-s...@lists.mozilla.org
On 04/12/2018 13:36, Nick Lamb wrote:
> On Tue, 4 Dec 2018 07:56:12 +0100
> Jakob Bohm via dev-security-policy
> <dev-secur...@lists.mozilla.org> wrote:
>
>> Which systems?
>
> As far as I'm aware, any of the automated certificate issuance
> technologies can be used here, ACME is the one I'm most familiar with
> because it is going through IETF standardisation and so we get to see
> not only the finished system but all the process and discussion.
>

Oh, so you meant "CA issuance systems and protocols with explicit
automation features" (as opposed to e.g. web server systems or operating
systems or site specific subscriber automation systems). That's why I
asked.

And note that this situation started with an OV certificate, not a DV
certificate. So more than domain ownership needs to be validated.

>> I prefer not to experiment with live certificates. Anyway, this was
>> never intended to focus on the specifics of ACME, since OC issuance
>> isn't ACME anyway.
>
> The direction of the thread was: Excuses for why a subscriber can't
> manage to replace certificates in a timely fashion. Your contribution
> was a claim that automated deployment has poor operational security
> because:
>
> "it necessarily grants read/write access to the certificate data
> (including private key) to an automated, online, unsupervised system."
>
> I've cleanly refuted that, showing that in a real, widely used system
> neither read nor write access to the private key is needed to perform
> automated certificate deployment. You do not need to like this, but to
> insist that something false is "necessarily" true is ludicrous.
>

You have shown that ONE system, which you happen to like, can avoid that
weakness, IF you ignore some other issues. You have not shown that
requiring subscribers to do this for any and all combinations of
validation systems and TLS server systems they encounter won't have this
weakness.

>> So returning to the typical, as-specified-in-the-BRs validation
>> challenges. Those generally either do not include the CSR in the
>> challenge, or do so in a manner that would involve active checking
>> rather than just trivial concatenation. These are the kind of
>> challenges that require the site owner to consider IF they are in a
>> certificate request process before responding.
>
> I _think_ this means you still didn't grasp how ACME works, or even how
> one would in general approach this problem. The CSR needs to go from
> the would-be subscriber to the CA, it binds the SANs to the key pair,
> proving that someone who knows the private key wanted a certificate for
> these names. ACME wants to bind the names back to the would-be
> subscriber, proving that whoever this is controls those names, and so
> is entitled to such a certificate. It uses _different_ keys for that
> precisely so that it doesn't need the TLS private key.

It means ACME is of very little relevance to OV and EV certificates from
most/all current OV and EV CAs.

>
> But most centrally the Baseline Requirements aren't called the "Ideal
> Goals" but only the "Baseline Requirements" for a reason. If a CA
> approaches them as a target to be aimed for, rather than as a bare
> minimum to be exceeded, we're going to have a problem. Accordingly the
> Ten Blessed Methods aren't suggestions for how an ideal CA should
> validate control of names, they're the very minimum you must do to
> validate control of names. ACME does more, frankly any CA should be
> aiming to do more.

I made no such claim. I was saying that your hypothetical that all/most
validation systems have the properties of ACME and that all/most TLS
servers allow certificate replacement without access to the private key
storage represents an idealized scenario different from practical
reality.

>
>> See for example NIST SP 1800-16B Prelim Draft 1, Section 5.1.4 which
>> has this to say:
>>
>> "... It is possible to renew a certificate with the same public and
>> private keys (i.e., not rekeying during the renewal process).
>> However, this is only recommended when the private key is contained
>> with a hardware security module (HSM) validated to Federal
>> Information Processing Standards (FIPS) Publication 140-2 Level 2 or
>> above"
>
> Just before that sentence the current draft says:
>
> "It is important to note that the validity period of a certificate is
> different than the cryptoperiod of the public key contained in the
> certificate and the corresponding private key."

And the paragraph I quoted says to not do that unless you are using a
HSM, which very few subscribers do.

>
> Quite so. Thus, the only reason to change both at the same time is as I
> said, a convenience of scheduling, NIST does not claim that creating
> certificates has any actual impact on the cryptoperiod, they just want
> organisations to change their keys frequently and "on renewal" is a
> convenient time to schedule such a change.

It is not a convenience of scheduling. It is a security best practice,
called out (as the first example found) in that particular NIST
document.

>
> Moreover, this is (a draft of) Volume B of NIST's guidance. There is an
> entire volume, Volume C, about the use of automation, to be published
> later. I have no idea what that will say, but I doubt it will begin by
> insisting that you need read-write access to private keys to do
> something people are already doing today without such access.
>

Which has absolutely no bearing on the rule that keys stored outside an
HSM should (as a best practice) be changed on every reissue. It would
be contradictory if part B says not to reuse keys, and part C then
prescribes an automation method violating that.


>
>> I am referring to the very real facts that:
>>
>> - Many "config GUI only" systems request certificate import as
>> PKCS#12 files or similar.
>
> This is a real phenomenon, and encourages a lot of bad practices we've
> discussed previously on m.d.s.policy. It even manages to make the
> already confusing (for lay persons) question of what's "secret" and what
> is not yet more puzzling, with IMNSHO minimal gains to show for it. Use
> of PKCS#12 in this way can't be deprecated quickly enough for my liking.
>

So it is real.

> [ This is also related to the Windows ecosystem in which there's a
> pretence kept up that private keys aren't accessible once imported,
> which of course isn't mechanically true since those keys are needed by
> the system for it to work. So bad guys can ignore the documentation
> saying its impossible and just read the keys out of RAM with a trivial
> program, but good guys can't get back their own private keys.
> A true masterpiece of security engineering, presumably from the same
> people who invented the LANMAN password hash. ]
>
>> - Many open source TLS servers require supplying the private key as
>> an unencrypted PKCS#8 PEM key file, either appended to the PEM file
>> with the certificate chain or as a matching file with same file name.
>
>> Either of those require private key access to change the certificate
>> (for the parallel key file case, only if following the recommendation
>> to rekey on each renewal).
>
> This just imports the pretence that NIST's recommendation aimed at
> ensuring keys get changed _at all_ implies all new certificates must
> have new keys from above and isn't a separate idea.

No, I am stating that:

- For systems that want the certificate as a PKCS#12 file only,
certificate import requires private key import and thus private key
write access.

- For systems that append the private key pem file to the certificate
chain PEM file, certificate import requires write access to the file
storing the private key.

- For systems that put the key and certificate chain in parallel PEM
files (such as Apache HTTPD), granting write access to the certificate
but not the private key is potentially possible, though not
necessarily. For example, typical Apache HTTPD configurations place
the certificate and key files in the same POSIX disk directory, where
write access would be granted to the operator installing new
certificates.

>
>> The risk of forgetting to do the renewal (a self-inflicted risk that
>> occurs only at the time when this should have been in your calendar)
>> is substantially different than the risk of someone from the outside
>> suddenly demanding doing the procedure as a rush job at a completely
>> different time.
>
> Today many large organisations struggle to do the first of these
> things, adoption of automation would enable them to easily do both. In
> practice what's happening is that they adopt "Cloud" technologies that
> throw in the certificate automation for free.
>
> [ e.g. Amazon will charge you cash money for a service that mints
> certificates for internal use and manages their keys. But if you want
> TLS certificates for the Web PKI those are free when Amazon hosts your
> web site and will just happen automatically ]
>

This assumes that granting a big global cloud provider easy access to
your organization's private keys is considered an acceptable risk, which
is not at all a given.

For a non-US government entity (as was the case with the malformed
D-TRUST certificate), there may very well be hard requirements against
that. In particular given their decision to use a national CA instead
of a global CA like Comodo.


>
>> I also explained, that ACME wasn't the target, and any mitigations
>> specific to ACME are of little relevance.
>
> Ensuring that the entity making the issuance request is also behind the
> proof-of-control responses is a common sense feature, it just happens
> to exceed what is mandated by the Baseline Requirements. This behaviour
> would have mitigated a considerable number of goofs we've seen in the
> last few years where CAs weren't actually achieving what they thought
> they had in their validation methods.
>
> It also isn't central to my argument since the key protected here is
> not the TLS private key.
>

And again you are arguing as if ACME is a typical case.

>> Again, this is only for your chosen ACME example, while I was
>> referring to traditional challenges closely matching what the BRs say
>> should be done.
>
> I've read the ACME documentation. I see there is at least some
> documentation for Sectigo/ Comodo's solution in this area but it
> doesn't appear to cover the actual protocol itself. Who else has a
> publicly documented automation protocol ?
>

And there you assume that automation is the norm. Which I am arguing it
is not.

Nick Lamb

unread,
Dec 4, 2018, 7:05:53 PM12/4/18
to dev-secur...@lists.mozilla.org, Jakob Bohm
On Tue, 4 Dec 2018 14:55:47 +0100
Jakob Bohm via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> Oh, so you meant "CA issuance systems and protocols with explicit
> automation features" (as opposed to e.g. web server systems or
> operating systems or site specific subscriber automation systems).
> That's why I asked.

Yes. These systems exist, have existed for some time, and indeed now
appear to make up a majority of all issuance.

> And note that this situation started with an OV certificate, not a DV
> certificate. So more than domain ownership needs to be validated.

Fortunately it is neither necessary nor usual to insist upon fresh
validations for Organisational details for each issuance. Cached
validations can be re-used for a period specified in the BRs although
in some cases a CA might chose tighter constraints.

> You have shown that ONE system, which you happen to like, can avoid
> that weakness, IF you ignore some other issues. You have not shown
> that requiring subscribers to do this for any and all combinations of
> validation systems and TLS server systems they encounter won't have
> this weakness.

Yes, an existence proof. Subscribers must of course choose trade-offs
that they're comfortable with. That might mean accepting that your web
site could become unavailable for a period of several days at short
notice, or that you can't safely keep running Microsoft IIS 6.0 even
though you'd prefer not to upgrade. What I want to make clear is that
offering automation without write access to the private key is not only
theoretically conceivable, it's actually easy enough that a bunch of
third party clients do it today because it was simpler than whatever
else they considered.

> I made no such claim. I was saying that your hypothetical that
> all/most validation systems have the properties of ACME and that
> all/most TLS servers allow certificate replacement without access to
> the private key storage represents an idealized scenario different
> from practical reality.

Subscribers must choose for themselves, in particular it does not
constitute an excuse as to why they need more time to react. Choices
have consequences, if you choose a process you know can't be done in a
timely fashion, it won't be done in a timely fashion and you'll go
off-line.

> And the paragraph I quoted says to not do that unless you are using a
> HSM, which very few subscribers do.

It says it only recommends doing this for a _renewal_ if you have an
HSM. But a scheduled _renewal_ already provides sufficient notice for
you to replace keys and make a fresh CSR at your leisure if you so
choose. Which is why you were talking about unscheduled events.

If you have a different reference which says what you originally
claimed, I await it.

> It is not a convenience of scheduling. It is a security best
> practice, called out (as the first example found) in that particular
> NIST document.

If that was indeed their claimed security best practice the NIST
document would say you must replace keys every time you replace
certificates, for which it would need some sort of justification, and
there isn't one. But it doesn't - it recommends you _renew_ once per
year‡, and that you should change keys when you _renew_, which is to
say, once per year.

‡ Technically this document is written to be copy-pasted into a three
ring binder for an organisation, so you can just write in some other
amount of time instead of <one year or less>. As with other documents of
this sort it will not achieve anything on its own.

> Which has absolutely no bearing on the rule that keys stored outside
> an HSM should (as a best practice) be changed on every reissue. It
> would be contradictory if part B says not to reuse keys, and part C
> then prescribes an automation method violating that.

There is no such rule listed in that NIST document. The rule you've
cited talks about renewals, but a reissue is not a renewal. There was
nothing wrong with the expiry date for the certificate, that's not why
it was replaced.

There are however several recommendations which contradict this idea
that it's OK to have processes which take weeks to act, such as:

"System owners MUST maintain the ability to replace all certificates on
their systems within <2> days to respond to security incidents"

"Private keys, and the associated certificates, that have the
capability of being directly accessed by an administrator MUST be
replaced within <30> days of reassignment or <5> days of termination of
that administrator"


The NIST document also makes many other recommendations that - like the
one year limit - won't be followed by most real organisations; such as a
requirement to add CAA records, to revoke all their old certificates
a short time after they're replaced, the insistence on automation for
adding keys to "SSL inspection" type capabilities or the prohibition of
all wildcards.

> So it is real.

Oh yes, doing things that are a bad idea is very real. That is, after
all, why we're discussing this at all.

> - For systems that want the certificate as a PKCS#12 file only,
> certificate import requires private key import and thus private key
> write access.

Yup. This is a bad design. It's come up before. It's not our place to
tell programmers they can't do this, but it's certainly within our
remit (or indeed NIST's) to remind users that software designed this
way doesn't help them achieve their security goals. It can go on that
big heap of NIST recommendations actual users will ignore.

> - For systems that append the private key pem file to the certificate
> chain PEM file, certificate import requires write access to the file
> storing the private key.

This is also bad design but it's pretty trivial to "mask out" in a
wrapper of the software. I'm sure there are programs where this is
mandatory but in the ones I've seen it's usually an option rather than
the only way to provide certificates.

> - For systems that put the key and certificate chain in parallel PEM
> files (such as Apache HTTPD), granting write access to the
> certificate but not the private key is potentially possible, though
> not necessarily. For example, typical Apache HTTPD configurations
> place the certificate and key files in the same POSIX disk directory,
> where write access would be granted to the operator installing new
> certificates.

Directory permissions might be one of the POSIX features most likely to
be misunderstood by people (as distinct from them knowing they don't
understand it). The operator writing to a certificate file does NOT need
write permission for a directory that certificate is in, such permission
would let them change the directory, which isn't what they need to do.
That operator only needs permission to write to the certificate file.

More over, in a truly automated system we should distinguish between
the permissions granted to the system and that fraction available to a
human user of the system. It is entirely possible that an automated
system which is technically permitted to write to a private key file is
not, in fact, designed to do so and does not do so, so that its user
cannot cause this to happen as a result of using the system.

> This assumes that granting a big global cloud provider easy access to
> your organization's private keys is considered an acceptable risk,
> which is not at all a given.

It may not be. Of course whether using cloud services in fact gives
them "easy access to your organisation's private keys" is a matter of
some debate, you will certainly find representatives of the major cloud
service providers happy to explain why they think their systems offer
better safeguards against malfeasance than whatever home-brew system
your organisation has itself.

One of the nice effects in automation at scale is that you can resist
the temptation to do things manually since it becomes necessarily more
work than automating them. This results in a situation where, say, an
AWS engineer isn't allowed to log into a customer's virtual machine and
tinker with their private keys NOT just out of a sense of the importance
of customer privacy but because doing so will never scale across the
platform. Any engineer who wants to do this is Bad at their job, even if
they aren't in fact a privacy-invading snoop, so there's no reason to
make it possible and every reason to detect them and fire them.

Symantec was never able to wean itself off the practice of manually
issuing certificates, even after years of problems caused by exactly
that approach. In contrast as I understand it ISRG / Let's Encrypt
obtained even their Mozilla-required test leaf certificates by...
actually requesting them through their automated issuance system just
like an end user.

> And there you assume that automation is the norm. Which I am arguing
> it is not.

Well there's the thing. In terms of volume it is. That sort of thing
will sneak up on you with automation.

The largest CA by far in volume terms is ISRG's Let's Encrypt which of
course only issues with ACME. The second largest is probably Comodo /
Sectigo which issues a huge volume for cPanel (an automation solution)
and Cloudflare (also automated). Some fraction of the certs at second
tier CAs like DigiCert are automated but I would not hazard a guess at
how many.

Nick.

Jakob Bohm

unread,
Dec 5, 2018, 4:20:12 PM12/5/18
to mozilla-dev-s...@lists.mozilla.org

On 05/12/2018 01:05, Nick Lamb wrote:
> On Tue, 4 Dec 2018 14:55:47 +0100
> Jakob Bohm via dev-security-policy
> <dev-secur...@lists.mozilla.org> wrote:
>
>> Oh, so you meant "CA issuance systems and protocols with explicit
>> automation features" (as opposed to e.g. web server systems or
>> operating systems or site specific subscriber automation systems).
>> That's why I asked.
>
> Yes. These systems exist, have existed for some time, and indeed now
> appear to make up a majority of all issuance.
>

I didn't doubt that automation systems exist, I was thoroughly confused
when, a few messages back you wrote a reference to "these systems"
without stating which systems.

>> And note that this situation started with an OV certificate, not a DV
>> certificate. So more than domain ownership needs to be validated.
>
> Fortunately it is neither necessary nor usual to insist upon fresh
> validations for Organisational details for each issuance. Cached
> validations can be re-used for a period specified in the BRs although
> in some cases a CA might chose tighter constraints.
>

However an OV or EV issuance often involve substantially different
choices for domain validation and especially for validating the CSR-to-
subscriber-identity relationship than the choices made for robotic DV
issuance systems, even when the organizational identity validation is
cached. For example, I know of at least one CA where the process
involves a subscriber representative signing a paper form with a
printout of the CSR (as one of multiple steps).

>> You have shown that ONE system, which you happen to like, can avoid
>> that weakness, IF you ignore some other issues. You have not shown
>> that requiring subscribers to do this for any and all combinations of
>> validation systems and TLS server systems they encounter won't have
>> this weakness.
>
> Yes, an existence proof. Subscribers must of course choose trade-offs
> that they're comfortable with. That might mean accepting that your web
> site could become unavailable for a period of several days at short
> notice, or that you can't safely keep running Microsoft IIS 6.0 even
> though you'd prefer not to upgrade. What I want to make clear is that
> offering automation without write access to the private key is not only
> theoretically conceivable, it's actually easy enough that a bunch of
> third party clients do it today because it was simpler than whatever
> else they considered.

Existence proof is good for refuting a claim that something doesn't
exist. It does nothing to prove that it is the only good thing.

Nothing I wrote has any relationship to Microsoft software specifics
(except for my brief reply to your own aside about another Microsoft
technology).

You have yet to point out any non-ACME client that organizations can
use to automate the renewal and replacement of OV and EV certificates
without write access to the private key, thus I can not validate your
claims that there are "a bunch of third party clients" doing that.
You have only made some claims about what would be theoretically
possible for the ACME HTTP-01 protocol.

(You mention cPanel below, more there).

>
>> I made no such claim. I was saying that your hypothetical that
>> all/most validation systems have the properties of ACME and that
>> all/most TLS servers allow certificate replacement without access to
>> the private key storage represents an idealized scenario different
>> from practical reality.
>
> Subscribers must choose for themselves, in particular it does not
> constitute an excuse as to why they need more time to react. Choices
> have consequences, if you choose a process you know can't be done in a
> timely fashion, it won't be done in a timely fashion and you'll go
> off-line.

The choice of validation protocol is one made by the CA, subscribers
have little influence except where a CA happens to offer more than
one validation method or where multiple CAs are otherwise equal in
terms of the subscribers selection criteria.

Outside of the pressure this community makes on CAs, there is very
little reason why subscribers should expect that CAs suddenly revoke
their certificate for entirely CA-internal reasons. Therefore it is
unreasonable to expect the general population of site owning
organizations to plan on the basis that this is a risk worth
planning for.

>
>> And the paragraph I quoted says to not do that unless you are using a
>> HSM, which very few subscribers do.
>
> It says it only recommends doing this for a _renewal_ if you have an
> HSM. But a scheduled _renewal_ already provides sufficient notice for
> you to replace keys and make a fresh CSR at your leisure if you so
> choose. Which is why you were talking about unscheduled events.
>
> If you have a different reference which says what you originally
> claimed, I await it.
>

Now you are going off on a huge tangent about the detailed specifics
of that particular document and its choice of words. The document was
arbitrarily chosen as the first one I could dig up mentioning this long
standing general practice of "one cert=one key".

As a paying subscriber at other CAs, I would expect a CA-forced sudden
reissue to at least include a complimentary extension of validity, as
compensation for the sudden loss of service availability (I am talking
about the availability of the CA service, not the availability of the
TLS service that relies on the CA). This would often mean that the
replacement cert would have a validity beyond the end of the original
cert, thus justifying the need to give it a new key for crypto-period
reasons alone.
(Here you snipped a change of subject)

>> So it is real.
>
> Oh yes, doing things that are a bad idea is very real. That is, after
> all, why we're discussing this at all.

No, we are discussing if it is reasonable to expect regular organizations
to handle CA-initiated sudden revocations either by having a 24/7/365
security staff with the ability and authority to handle this or by having
a robotic script that can handle such events via a (yet to be defined)
CA-to-subscriber notification protocol.

One of my arguments for saying it is unreasonable to expect regular
organizations (not big CAs) to have that ability is that whatever handles
the request at the subscriber end (whether a robot or a human) will in
many practical cases need privileged access to the private key, which is
something that should not be granted to extraneous 4th shift techs or
Internet-launchable customized scripts.

Systems that need the certificates to be input in PKCS#12 form is one
example of systems where a certificate cannot be replaced without access
to the private key, even if (as you keep wanting) the certificate would
be issued for the same keypair as the old certificate.

>
>> - For systems that want the certificate as a PKCS#12 file only,
>> certificate import requires private key import and thus private key
>> write access.
>
> Yup. This is a bad design. It's come up before. It's not our place to
> tell programmers they can't do this, but it's certainly within our
> remit (or indeed NIST's) to remind users that software designed this
> way doesn't help them achieve their security goals. It can go on that
> big heap of NIST recommendations actual users will ignore.
>

Other than the weakness of some historic PKCS#12 implementations (limited
to 40 bit keys!), using PKCS#12 files as the software equivalent of a
crypto ignition key is not fundamentally flawed. Especially if there is
a desire to generate the private key using a dedicated key generation
facility (such as the ones alluded to in various NIST documents).

One way to use PKCS#12 key+cert installation in a high security manner is
to have the key-generation facility put the PKCS#12 file on a removable
medium, transport that medium in a sealed container to the server facility,
then having a two-man team install the PKCS#12 file, with one person having
the medium and the other knowing the random password, then securely
destroying the medium. Neither person is allowed to copy the medium, and
the password plus file never coexist outside the target server and key
generation facility.

>> - For systems that append the private key pem file to the certificate
>> chain PEM file, certificate import requires write access to the file
>> storing the private key.
>
> This is also bad design but it's pretty trivial to "mask out" in a
> wrapper of the software. I'm sure there are programs where this is
> mandatory but in the ones I've seen it's usually an option rather than
> the only way to provide certificates.
>

And the (non-)security of such a wrapper implementation was part of my
initial argument.

Anyway, keeping key+cert chain in a single file provides the desirable
property that normal cert replacement (planned renewal with fresh key)
can be done atomically with a single "mv -f new.ext current.ext" on a
running system (except the tiny window of file non-existence during the
operation on many POSIX systems).

>> - For systems that put the key and certificate chain in parallel PEM
>> files (such as Apache HTTPD), granting write access to the
>> certificate but not the private key is potentially possible, though
>> not necessarily. For example, typical Apache HTTPD configurations
>> place the certificate and key files in the same POSIX disk directory,
>> where write access would be granted to the operator installing new
>> certificates.
>
> Directory permissions might be one of the POSIX features most likely to
> be misunderstood by people (as distinct from them knowing they don't
> understand it). The operator writing to a certificate file does NOT need
> write permission for a directory that certificate is in, such permission
> would let them change the directory, which isn't what they need to do.
> That operator only needs permission to write to the certificate file.

Yes, this could be done, if everything was designed around this rare
scenario rather than normal operations and system emergencies. Normal
certificate operations more commonly involve adding additional
certificates for additional domain names than they involve replacing
certificates at external request.

Something like

-rwxrwx--- root www 4096 Feb 29 2017 .
-rw-r----- robot www 12345 Feb 29 2018 certchain.pem
-rw-r----- keygen www 3272 Nov 31 2017 certchain.key

With the webserver somehow dropping dir access after loading keys,
despite already not running as root.

>
> More over, in a truly automated system we should distinguish between
> the permissions granted to the system and that fraction available to a
> human user of the system. It is entirely possible that an automated
> system which is technically permitted to write to a private key file is
> not, in fact, designed to do so and does not do so, so that its user
> cannot cause this to happen as a result of using the system.

My criticism of automated systems was about the risk that such a system
contained a security bug whereby an outside attacker could cause the
system to do something other than intended.

>
>> This assumes that granting a big global cloud provider easy access to
>> your organization's private keys is considered an acceptable risk,
>> which is not at all a given.
>
> It may not be. Of course whether using cloud services in fact gives
> them "easy access to your organisation's private keys" is a matter of
> some debate, you will certainly find representatives of the major cloud
> service providers happy to explain why they think their systems offer
> better safeguards against malfeasance than whatever home-brew system
> your organisation has itself.

Marketing != Truth.

>
> One of the nice effects in automation at scale is that you can resist
> the temptation to do things manually since it becomes necessarily more
> work than automating them. This results in a situation where, say, an
> AWS engineer isn't allowed to log into a customer's virtual machine and
> tinker with their private keys NOT just out of a sense of the importance
> of customer privacy but because doing so will never scale across the
> platform. Any engineer who wants to do this is Bad at their job, even if
> they aren't in fact a privacy-invading snoop, so there's no reason to
> make it possible and every reason to detect them and fire them.

This is a property of the cost-cutting measures of AWS. There are entire
companies founded on providing engineers who do log on to customer's
AWS-hosted VMs as a value-adding service.

And anyway, one fear with global cloud companies is that data might be
stolen or mangled via automation at scale, perhaps at the request of
foreign governments (remember the certificate in question was issued
to a government facility).

>
> Symantec was never able to wean itself off the practice of manually
> issuing certificates, even after years of problems caused by exactly
> that approach. In contrast as I understand it ISRG / Let's Encrypt
> obtained even their Mozilla-required test leaf certificates by...
> actually requesting them through their automated issuance system just
> like an end user.
>

Manually issuing certificates at a high volume CA is unrelated to
manually authorizing certificate requests at organizations with a
low number of certificates.

>> And there you assume that automation is the norm. Which I am arguing
>> it is not.
>
> Well there's the thing. In terms of volume it is. That sort of thing
> will sneak up on you with automation.
>
> The largest CA by far in volume terms is ISRG's Let's Encrypt which of
> course only issues with ACME. The second largest is probably Comodo /
> Sectigo which issues a huge volume for cPanel (an automation solution)
> and Cloudflare (also automated). Some fraction of the certs at second
> tier CAs like DigiCert are automated but I would not hazard a guess at
> how many.
>

Automation can produce a lot of noise, overwhelming statistics that
consider it equal to non-automation.

And this is the first time that you mention that cPanel has an automation
interface to Sectigo. I've never really looked at that software, but I
now wonder if it has the other properties that you assume an automation
system should have:

- Ability to replace OV/EV certificates at short notice without having
to wake up the site owner and convince them you are not a tech-support
scammer.

- Inability of everyone (including the site owner) to overwrite the
private key via an Internet-exposed interface.

Looking at documentation.cpanel.net I see little sign of these abilities.

westm...@gmail.com

unread,
Dec 10, 2018, 12:31:51 AM12/10/18
to mozilla-dev-s...@lists.mozilla.org
Hello,
D-TRUST will removed in the future or is this the last Chinese warning? :)

Andrew.
0 new messages