If a CA can't find its re-use of validation information in their audit
logs (as described in BR s5.4), then I believe that BR s5.4 was not
correctly implemented by that CA.
We’d like to offer our own perspective on this issue, having lived it firsthand, in case this perspective is valuable to the community.
It’s important to understand that while the total number of affected certificates was on the order of 100,000, the actual number of affected domains was about 1% of that. It just happened that there were a large number of certificates using a few of these domain names. That’s important because the exercise was to detect what ultimately turned out to be about 1000 domains with a DCV problem from the vast number of domains for which we perform DCV every year – and that the exercise was to isolate and eliminate the certificates with one or more incorrectly validated domains, not to make a sweeping revocation that would have gotten all the affected domains and a large number of unaffected certificates as well.
This last point is important. It would have been fast and easy to create a query that would have caught 100% of this misissuance and that also would have revoked an order of magnitude more of other certificates as well, despite the fact that they were perfectly fine. What slowed down the investigation was examining all domains in our corpus of active certificates for the many possible ways that DCV could have occurred. The tangled skein we had to investigate included these factors:
The key idea here is that the first DCV result that returns from an initial query may not be the only DCV event that actually occurred. We did have records of all these events, which is ultimately how we were able to execute this task, but as Ryan points out, this was one of those “data lake” situations, as we had to dig back into deeper records of our systems’ behavior.
It was, in fact, straightforward and reasonably fast to create that first list of suspect certificates for which we could not confirm that the “DCV reuse” had occurred within 825 days. In another circumstance that might have been the end of the query and we would have had our results. The problem here was that the reliance on DCV reuse was the very part of the system that was suspect, and so to put it under the magnifying glass we had to go to the very bottom of the data lake.
In other words, the fact that a particular certificate had DCV reuse marked incorrectly didn’t necessarily mean that DCV hadn’t occurred for that same domain in the specified time period, just that our primary record for that certificate didn’t indicate that this had happened. In response to that problem we have a ticket in to create a new table that will log our successful BR-compliant DCV checks in a manner that will make this kind of search considerably faster and easier to perform in the future.
Likewise, if the exercise had been to look at a single certificate or relatively few certificates, we could have found the answer very quickly, in the “minutes not days” that Watson asks about. However, if the request is for every large volume, global CA to be able at any time to perform an expansive search of every active certificate it has on any, single, unpredictable criterion that may be thrown its way and get a result back in minutes, that is a very difficult thing to be able to perform under any and all circumstances. Another way to think about this is, is the CA’s database meant to be something where all conceivable questions must be answerable immediately, or is it more reasonable to expect that for unexpected and complex questions involving large numbers of certificates the CA can perform a data investigation and return with answers after “days not minutes”?