All,
Obviously this is not the message we would like to read and will try to explain and rebate as much as possible some of the comments posted here.
>
> The Mozilla CA Certificates team has been considering what the appropriate
> next steps are for the inclusion request from the CA "StartCom".[0] As readers
> will know, this CA has previously been removed from trust[1], and so a re-
> application obviously involves particular scrutiny. In addition, several
> questions have been raised about the way in which the new StartCom PKI has
> been operated technically[2]. This is a proposal for the way forward, on which
> comments are invited.
>
> Mozilla's considered view is as follows:
>
> * It should have been obvious to StartCom that testing of their new systems
> needed to be done using a parallel testing hierarchy.
Those tests were done to check the CT behaviour, there was any other testing of the new systems, just for the CT. Those certs were under control all the time and were lived for some minutes because were revoked inmediately after checking the certs were logged correctly in the CTs. It´s not a mis-issuance by means of we didn´t know what happened, we had to investigate, etc. It was not a good practice and I can´t excuse for that, but it was not related to the regular issuance procedure as someone suggested. We provided a report in which indicated all that happened and what we did to not happen this again, updating the EJBCA roles permissions.
> That it was not obvious, is deeply concerning. It is also concerning that someone can sit at a terminal
> and issue random certificates with variable values in lots of fields, in what is
> to become a publicly-trusted hierarchy.
Well, it was possible at that time, but only the CA administrator could do it and under many requirements. It´s not like sitting at a terminal and start issuing certificates, there were and are security mechanisms to avoid "someone" could do that and I can list many. Probably most of the CA administrators of the rest of CAs had this capacity (maybe not now) because the majority of the PKI software allows it and it´s needed when building a hierarchy.
> It's not about numbers (e.g. "40 out of 50000"), it's about the process.
>
This number of 40 is about the total of "mis-issuances" discovered, not only related to these ones for the CT testing. And some other times, discussed in this list, the number matters. Even more, for those 40, most of them were "discovered" by us and acted accordingly as per the BRs. We revoked the majority of them within the 24 hours of being notified internally. When those were posted in the bugzilla, as said, most were revoked and started the investigation on what happened and what actions needed to be done. Some of these "mis-issuances" were due to some incongruencies between the BRs and the Mozilla policy, such as the use of different curves (allowed by the BRs but not for some browsers), or about pre-certificates in which is not clear if they fall under what requirement as a discussion started by Jeremy on the list. For example, is it necessary to revoke also the pre-certificates when a certificate is revoked? Are they need to be considered certificates and meet the BRs and/or Mozilla policy?
Or about the use of Unicode vs punnycode which is still under discussions, even a ballot failed in the CABF. So, those errors we made were also made by some others, and not being as an excuse, but it seems that it was not clear for the CAs.
We updated our procedure issuance to avoid these issues happening in mid July. What did we do?
- Restrict the use of eliptic curves only to those admitted by Mozilla
- Change the certificate profile for not having differences with the key encipherment and key agreement
- update the internal db for country codes
- update the sytems for changing all domain names to punnycode
- and recently develop a csr checking tool to avoid the issue with RSA parameters because EJBCA didn´t have it at that time (it comes now with the new release 6.9.0)
> As JC Jones wrote:
>
> "This is a professional PKI operation, being overseen by industry veterans. If
> something as concrete as the issuance process had such a glaring quality
> assurance methodology failure, why should anyone believe that something
> much harder -- subscriber validation -- is going to be done correctly?"
Well, this is an opinion. And I fully respect but none is free of failures and let´s encrypt (and many others) is also having them as we´re seeing recently with weak keys, etc. and I´m none to say they are not professional, or not having a quality assurance methodology, ... or for not believing they are not acting correctly. For sure they are, but same as us. I can´t critize anyone and not because we´re in a weak position at Startcom in which everyone is looking deeply what we do, scrutinizing deeply.
For all these failures we acted quickly because our internal procedures worked well and as said, at the time of discovering most of them were revoked and solutions were ongoing. It´s not like "thanks for letting us know, we didn´t know, we are going to investigate, etc."
I can put other opinions here. For example. Matthew Hardeman wrote:
"If Inigo has prior CA management experience and is running the technical picture at Startcom now, why not allow them to proceed under this new PKI infrastructure with past issues set aside and take a serious stance to any issues going forward.
As far as I know, the current manager of Startcom has not been previously accused of deception or bad action. Far more than has been problematic in this early testing phase of their new PKI has been forgiven by the root programs before.
Nothing disastrous or intentionally dishonest has been done in their new PKI. Why not grant them a gentleman's chance to proceed and address any further issues with great scrutiny?"
>
> * The key for their new root certificate was also used in a couple of
> intermediates (one revoked as it was done incorrectly - again, lack of
> testing!). While this is probably not a policy violation, it's not good practice.
>
Yes, it´s not a policy violation. As explained, this was a problem in the EJBCA with the UTF8 encoding. It´s not related to a lack of testing, we generated intermediates in our development and QA system, it´s the same procedure and we followed it, nothing happened in the others but this one had this issue, so we had to revoke and create a new one. This happened in April.
> * StartCom's infrastructure audit, performed by Cure53, was frankly a security
> disaster. (They are using EJBCA for CA operations; this was an audit of their
> front end and customer management systems, which were rewritten by a
> team from their new owner, Qihoo 360.) The (PHP) codebase was full of
> holes, poorly commented, had few or no tests, and showed every evidence of
> being hacked together in an enormous rush. This does not inspire confidence.
> Cure53 say they retested a couple of months later and most of the holes they
> found were fixed - although they found quite a few more. All this does not
> bode well - Cure53 are not infallible, an audit is not a substitute for secure
> coding practices, and the initial results show that the software was clearly not
> built by people who understand software security. The summary of their
> results is attached to their Action Items bug[3], but it does leave out some of
> the more critical passages of commentary from the original, and of course
> does not show the particular holes found and their scope and severity.
Yes, it´s true. The first security audit didn´t go very well. That was mentioned also during the CABF F2F meeting at Cisco. In our remediation plan we imposed ourselves a very tight timing for this task and we failed. It was a very hard task in very few time but the people at 360 tried everything to get it done by that date, end of december 2016, and yes, we reached the date but with many failures. I may think that everyone has suffered this type of situations and none can write code at the first time without errors. So, of course, we went for "another round" because that was not aceptable. The second audit as you mention reflects that the issues found in the first one were fixed and some new ones cameo ut, but which were also fixed later on. Since then, the RD team and Security team have evolved the system and right now is robust.
In any case, until we had the OK from Cure 53, we didn´t go further and didn´t go live. Later on we generated the subCAs in production and then started to issue certificates.
And for sure Cure 53 is not infalible, but it´s true that those security audits gone very deep, and that the security team at 360 has continued improving the system.
>
> * The WT/BR/EV audits on StartCom's website are significantly qualified, and
> they include lack of controls on issuance. They should have clean ones done
> before we permit any inclusion request to proceed. The qualifications include:
>
> - Risk analysis process defined but not implemented
> - Business continuity plan defined but not implemented
> - Audit logs not guaranteed to have integrity
> - Monitoring system cannot detect security-related changes to
> Certificate Systems
>
Yes, this is also true. Our webtrust audits have findings but those are not so significant according to the auditors who signed the reports, so I assume the auditors thought that the system is good enough to have the audit report in place. Of course, I had wanted to have a clean one but it´s also true that the reports indicate that most of them are fixed, and I think it´s a matter of transparency.
We prepared a Corrective Action Plan providing solutions for all the findings and indicating time of application. We sent to the auditors with all the evidences showing that most were fixed rightly.
I´d like to explain those you mention.
Risk analysis: the risk analysis was defined and was implemented. We had the 2016 risk analysis done and the new 2017 risk analysis was scheduled in october this year, following the agenda set. The auditors requested the 2017 one and hence we did it just after the audit. But in any case, the 2017 risk analysis is done and sent to the auditors.
Business Continuity Plan: the BCP was defined and implemented, but partially. We were very optimistic to meet everything we wrote and at the time of the audit some things that were in the document were not finished in time, so, that´s the finding. Of course, I could have written a BCP less onerous and then have met what we had in place and not having the finding, but I wanted a very good one, and thus, didn´t mind that finding. BTW, we´ve finished our BCP according to the document.
Audit logs integrity: all logs were and are signed internally in the PKI system (it´s part of the configuration of the EJBCA) and provided all the evidences but they also requested to do the same with some other external components and not all products come with that feature, so had to develop it. It was applied at the beginning of june and sent the evidences to the auditors.
Monitoring: Startcom´s monitoring system was focused mainly on the monitoring of the server status (CPU, load, memory, etc.) of all servers under the PKI infrastructure. Furthermore, all services provided were also monitored, to check that even the server hosting the service was ok, the service had to be also live and running. So checking the service availability.
This finding is about updating the monitoring configuration alerting StartCom specific teams/people to check if there are some sensitive configuration files of the infrastructure that are being changed. Internally, StartCom has a manual procedure when some changes/updates to these files are requested in which need the approval by the service manager, then modify/update the system, and the auditors wanted to have this done also automatically, so updated the monitoring to alert inmediately specific teams in case of sensitive configuration files are changed. We extended the scope to cover all systems, DB, webservers, nginx, ... and are using a tool called inotify, which monitor and check all the writing operations. This was done at the beginning of June.
> * Certnomis chose to cross-sign StartCom while StartCom had audits with
> significant qualifications,
Well, the auditors explained in the reports that most of them were fixed, so even they are in the report because it´s what they found and despite some discussions about all them (I wasn´t agree with some), you can´t consider them as significant (I know it´s an opinion), or at least it´s contrary to auditor´s opinion. You can contact the auditors if you wish to request for their opinion about the audit report, what they found, what we provided, the CAP, etc.
> and allowed them to recommence publicly-trusted issuance before they had demonstrated to Mozilla that they had met the
> remediation conditions required. While this may not have been against the
> letter of our requirements for StartCom to restart trusted operations, we feel
> it was not in the spirit of them.
>
Not sure how to interpretate this. We followed our remediation plan and the Mozilla requirements, once we met all of them, we reapplied to be included in Mozilla, which it was in july (time after the mis-issuances, security reports and audit findings) following the steps as any other CA. We´ve seen in the past some others doing the same. Certinomis put us some requirements, like having the WT audit certificates, but maybe some others don´t put any requirement, just cross-sign the new CA and this one, in the meantime, gets its audit for example.
> All in all, this attempt to start a new CA compares poorly with other recent
> executions of this process, such as those by Google and Amazon.
> While those companies do have significantly more resources than StartCom,
> many of the issues raised are questions of good practice, not of money.
>
I don´t know how these you mention have applied, but I remember lots of issues regarding Google and the acquisition of the Globalsign roots and how they proceeded.
Again I don´t know these examples so can´t speak for them and don´t like to talk about others but it´s different to start when you have customers requesting certificates, asking for all that happened, etc. rather than starting from scratch if you don´t have customers yet or very few, or when you have other roots accepted in the root programs from which you could provide your services normally. We were in a very difficult situation, with lots of pressure even from the mozilla community, because even though being distrusted (we didn´t exist) and not applied yet for re-inclussion we were all the time asked/suggested/questioned many things that maybe are not done with other CAs.
And yes, at the end of the day, it´s a matter of money. You can´t do many things, or have to be put on hold new projects if you have no resources or these are limited. I´d like to do many things at startcom at the same time, I have lots of projects in my mind to develop but can´t be done because of it (lack of money). I don´t think this is only a matter of good practice because all is related.
> Conclusion: StartCom's attempt to restart the CA was rushed.
Yes, I admit it.
> One could speculate why that was; perhaps due to a requirement to start generating
> income again.
Well, finally this is a business and I don´t think none on this list is working for free. At the end everyone has his/her salary, etc. But that was not the main reason because getting included in the root programs takes time but wanted to provide our customers which gave us support for what happened with the distrust (which IMHO in the case of Startcom was very aggressive) a solution generating a new fresh and clean system.
> But a process of building a production PKI by trial and error, revoking your mistakes, is entirely inappropriate.
I don´t think we´ve built a trial and error PKI system. As you´ve said, 40 of 50000 in this question matters. If everything was a trial and error system, of course, it´s inaceptable, but I think this is not the case. As said, some of those 40 are due to some different "interpretations" which are still being discussed in the mozilla list. Revoking is the first thing to do for a CA, and start investigating the issue and propose countermeasures to avoid that happening again. And this is what we have done.
And, using the same argument, and seing the recently issues that have been described in the mozilla list during this summer, I don´t think that Startcom is doing that bad not based in numbers not in the errors itself. As said, we have implemented and improved many things, integrating cablint/X509lint in our processes, also the crt.sh in our CMS system, key validations, have all new EJBCA releases up to date, etc.
> The qualified audits include missing or unimplemented processes, and audit/monitoring failures which
> lead to uncertainty as to how well the new roots were protected.
This has been explained and don´t think is accurate. Regarding the roots, they are very well protected, we had a root key ceremony which an auditor witnessing it and with a final report in which everything was ok.
> This all shows that StartCom were not ready to start up the production PKI when they
> did. And yet Firefox today trusts tens of thousands of certificates issued by
> this PKI.
Not sure about this. We were distrusted in october 2016, the new system started to operate in april 2017, which is not related to the old one which has been switched off. None of the new certs are trusted in Mozilla Firefox and we notify our users so by messages in the web and applications.
>
> Considering all this, our proposal is to require that StartCom begin again with
> new-new roots. These roots should be generated inside an already-security-
> validated infrastructure, as part of a new WT/BR audit process, the end
> results of which are clean because they already have all the policies and
> processes in place before the roots are generated.
> They should also build and use a parallel testing hierarchy, so that major
> operations done on the production PKI are done right, first time.
> Once they have generated new-new roots and intermediates, and got clean
> audits, they can re-re-apply for inclusion.
I don´t know how to understand this requirement. We´re required to generate new roots and intermediates, get a clean audit and then re-apply. So, the only difference of what we have done, it´s just the clean audit, which I´ve already explained. Is this interpretation ok?
>
> No-one should be allowed to cross-sign this new hierarchy until, at minimum,
> Mozilla has pronounced itself satisfied that the 5 (or 6) remediation
> conditions which were imposed have been met. To permit otherwise is to
> allow the bypassing of Mozilla's requirements.
Ok, I see this is a new requirement that was not imposed last time in which you recommended and allowed us to be cross-signed as many other CAs have done in the past to be in the business.
We´ve met all the conditions, new system, new management, security audit and webtrust audit and CT logging. In those conditions, it was not mentioned that the webtrust audit should be clean but as indicated time ago, we wanted to have a clean one and hence perform a new one (we told so the auditors), but asked us to wait until we had everything in place (also said recently that only the TSA and the BCP issues were pending) , and then wait for another 2 months as the WT requirements indicate.
>
> We should add the existing Certnomis cross-signs to OneCRL to revoke all the
> existing certificates. As of 10th August (now a month ago) StartCom said they
> have 50000 outstanding SSL certs which are valid due to the Certnomis cross-
> sign.
I´ve never said this. In fact, despite having that cross-signed which were provided to us in july we have never used and provided to any of our customers to build a trusted path. So none of those 50000, or the new ones, go with the Certinomis path because none have it. But all those 50000 certs are untrusted because we´re not in the Mozilla root, not the new one, and the old one was distrusted.
In fact, recently, I asked for permission to use the Certinomis cross-signed certificates and have no response. I don´t know if this is an administrative silence which may allow me to use it but until having a clear direction we haven´t used it.
> Revoking them all by adding intermediates to OneCRL would therefore lead to non-negligible disruption. But these were issued by an org whose
> most recent audits are qualified, which is under sanction, and about whose
> issuance practices and process safety there is a reasonable amount of doubt.
Again, I don´t understand why you say this. We haven´t used the Certinomis path so no need to revoke anything. Regarding the audits, which are qualified, does this mean that only clean audits are valid? This is not a requirement in the Mozilla policy afaik.
> We may allow a grace period for customers to replace them with certs from a
> trusted provider.
>
> We are not sure what to do about StartCom's poor quality PHP code. While
> continued use of it would cause us concern, we are not really in a position to
> request particular changes to it, or a complete rewrite, in a verifiable way. On
> the other hand, a security audit is a remediation condition, and the current
> codebase can hardly be said to have passed with flying colours.
I think this has been explained. I don´t understand why you say it´s a "poor quality PHP code". As said, I admit that the first time was not good, but that´s not the reality now and it´s not when we applied for re-inclusion. Are you going to check all the code of all CAs? I don´t think so.
>
> We feel some sympathy for StartCom CEO Inigo Barreira, who has been
> placed in a difficult position since he took on the role,
Thanks, I really appreciate it.
> but we need to treat all CAs equally and fairly, according to our professional judgement. We would
> not accept this set of circumstances from other CAs, and so we feel we cannot
> accept it here.
Well, frankly, I think the requirements impossed to StartCom are not the same you require for other CAs. We´ve had additional requirements and hence you´re not treating all the CAs equally. We accepted those requirements and I think we´ve gone further than that (i.e. adquiring a new well-know PKI solution like EJBCA) but it´s true that the reasons for distrust Startcom were not for technically reasons mainly and we see (technical) issues in other CAs which are not treated so deeply. So, I think that StartCom is treated differently.
We applied in july, following all the steps required and answering to the conditions impossed, some of the issues happened before the application, most of the audit findings were fixed before the application, we applied some of the countermeasures explained in mid-july, just after the application, since the application we´ve made lots of improvements as explained and I think Mozilla didn´t started yet handling our our application because we were waiting in the queue due to the number of applications.
So, when does this process start? I mean, are you dealing with historic or past issues even before applying? I think these questions have been posted in the mozilla list and this is the case which I think we´re not treated equally.
It´s been some discussion about StartCom during this time, digging and posting things not related but to hurt us, even when were distrusted and none of our certificates were valid somehow. The number again matters, 50000, for being distrusted is not a usual number, and these have been issued with the new one, just since april, and only 40 had some issues, which as mentioned, maybe not so clear and still under discussion. But instead, the charges against us seem that we are a total disaster and not deserve any trust nor chance. If the community feels so, then would like to see how the others are treated when make some mistake. And will think again we´re not treated equally.