Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Incident report Certum CA: Corrupted certificates

490 views
Skip to first unread message

Wojciech Trapczyński

unread,
Dec 3, 2018, 6:06:16 AM12/3/18
to mozilla-dev-security-policy
Please find our incident report below.

This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1511459.

---

1. How your CA first became aware of the problem (e.g. via a problem
report submitted to your Problem Reporting Mechanism, a discussion in
mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit),
and the time and date.

10.11.2018 10:10 UTC + 0 – We received a notification from our internal
monitoring system concerning issues with publishing CRLs.

2. A timeline of the actions your CA took in response. A timeline is a
date-and-time-stamped sequence of all relevant events. This may include
events before the incident was reported, such as when a particular
requirement became applicable, or a document changed, or a bug was
introduced, or an audit was done.

(All times in UTC±00:00)

10.11.2018 10:10 – We received a notification from our internal
monitoring system for issuing certificates and CRLs concerning issues
with publishing CRLs. We started verification.
10.11.2018 12:00 – We established that one of about 50 CRLs has
corrupted digital signature value. We noticed that this CRL has a much
larger size that others. We verified that in short period of time over
30 000 certificates had been added to this CRL.
10.11.2018 15:30 – We confirmed that the signing module has a trouble
with signing CRL greater than 1 MB. We started working on it.
10.11.2018 18:00 – We disabled the automatic publication of this CRL. We
verified that others CRLs have correct signature.
11.11.2018 07:30 – As part of the post-failure verification procedure,
we started the inspection of whole system including all certificates
issued at that time.
11.11.2018 10:00 – We verified that some parts of issued certificates
have corrupted digital signature.
11.11.2018 10:40 – We established that one from a few working in
parallel signing modules was producing corrupted signatures. We turned
it off.
11.11.2018 18:00 – We confirmed that the reason for the corrupted
signature of certificates was a large CRL which prevented further
correct operation of that signing module.
11.11.2018 19:30 – We left only one working signing module which prevent
further mis-issuances.
19.11.2018 11:00 – We deployed on production an additional digital
signature verification in external module, out of the signing module.
19.11.2018 21:00 – We deployed on production a new version of the
signing module which correctly handle a large CRL.

3. Whether your CA has stopped, or has not yet stopped, issuing
certificates with the problem. A statement that you have will be
considered a pledge to the community; a statement that you have not
requires an explanation.

11.11.2018 17:47

4. A summary of the problematic certificates. For each problem: number
of certs, and the date the first and last certs with that problem were
issued.

355.

The first one: 10.11.2018 01:26:10
The last one: 11.11.2018 17:47:36

All certificates were revoked.

5. The complete certificate data for the problematic certificates. The
recommended way to provide this is to ensure each certificate is logged
to CT and then list the fingerprints or crt.sh IDs, either in the report
or as an attached spreadsheet, with one list per distinct problem.

Full list of certificates in attachment.

6. Explanation about how and why the mistakes were made or bugs
introduced, and how they avoided detection until now.

The main reason for the corrupted operation of the signing module was
the lack of proper handling of a large CRL, greater than 1 MB. At the
moment when the signing module received such a large list for signing it
was not able to sign it correctly. In addition, the signing module
started to incorrectly sign the remaining objects received for signing
later, i.e. after receiving a large CRL for signature.

Due to the fact that at the time when problem occurred we were using
simultaneously several signing modules, the problem did not affect all
certificates issued at that time. Our analysis shows that the problem
affected about 10% of all certificates issued at that time.

We have been using this signing module for a few last years and at the
time of its implementation the tests did not include creation of the
signature for such large CRL. None of our CRLs for SSL certificates have
exceeded 100 KB so far. Such a significant increase in the size of one
of the CRLs was associated with the mass revocation of certificates by
one of our partner (revocations was due to business reasons). In a short
time, almost 30,000 certificates were found on the CRL, what is
extremely rare.

All issued certificates were unusable due to corrupted signature.

7. List of steps your CA is taking to resolve the situation and ensure
such issuance will not be repeated in the future, accompanied with a
timeline of when your CA expects to accomplish these things.

We have deployed a new version of the signing module that correctly
signs large CRLs. From now, we are able to sign a CRL that is up to 128
MB. In addition, we have improved the part of the signing module
responsible for verification of signatures (at the time of failure it
did not work properly).

We have deployed additional verification of certificate and CRL
signatures in the external component, in addition to the signing module.
This module blocks the issuance of certificates and CRLs that have an
corrupted signature.

We have extended the monitoring system tests that will allow us to
faster detection of incorrectly signed certificates or CRLs.

---

Jakob Bohm

unread,
Dec 3, 2018, 6:47:46 PM12/3/18
to mozilla-dev-s...@lists.mozilla.org
Question 1: Was there a period during which this issuing CA had no
validly signed non-expired CRL due to this incident?

Question 2: How long were ordinary revocations (via CRL) delayed by
this incident?

Question 3: Was Certum's OCSP handling for any issuing or root CA affected
by this incident (for example, were any OCSP responses incorrectly
signed?, were OCSP servers not responding? were OCSP servers returning
outdated revocation data until the large-CRL signing was operational on
2018-11-19 21:00 UTC ?)
Recommended additional precaution for all CAs (not just Certum):

- Ensure that your CRL and revocation processes (including CRL signing,
OCSP-response pre-signing, database sizes etc.) can handle the
hypothetical extreme of all issued certificates being revoked. For
example if an issuing Intermediary CA is configured to not issue more
than 1 million certificates, the associated revocation mechanisms should
be configured and tested to handle 1 million revocations.

- Maintain at least one hierarchy of not-publicly-trusted CAs that run
on the same platform (or an exact clone in a staging environment) and
routinely run such worst case scenarios through it.

Note that the first proposed precaution can be achieved by rolling new
Intermediary CAs more often (e.g. every 20000 certs for a 20000 cert CRL
signing limit) or by increasing the revocation capacity at the CAs
discretion. Of cause once an Intermediary CA has issued a certum number
of certificates, the capacity to revoke them all cannot be denied.

Also note that the worst case scenario is not the performance
optimization point, it is OK if running in this mode will entail horribly
slow performance, as long as it stays within the absolute maximums set by
standards, BRs, CPS etc. For example, OCSP responders might start taking
seconds to return each response and the CRL download webserver might slow
to a crawl. Ability to sign new CRLs may slow to one every 23 hours and
59 minutes. Which is why running these tests on non-production hardware
(with no physical security) is probably a smart choice.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Wojciech Trapczyński

unread,
Dec 4, 2018, 1:24:38 AM12/4/18
to Jakob Bohm, mozilla-dev-s...@lists.mozilla.org
Thank you. The answers to your questions below.

On 04.12.2018 00:47, Jakob Bohm via dev-security-policy wrote:
> On 03/12/2018 12:06, Wojciech Trapczyński wrote:
> Question 1: Was there a period during which this issuing CA had no
> validly signed non-expired CRL due to this incident?
>

Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00) we
were serving one CRL with corrupted signature.

> Question 2: How long were ordinary revocations (via CRL) delayed by
> this incident?
>

There was no delay in ordinary revocations. All CRLs were generating and
publishing in accordance with CABF BR.

> Question 3: Was Certum's OCSP handling for any issuing or root CA affected
> by this incident (for example, were any OCSP responses incorrectly
> signed?, were OCSP servers not responding? were OCSP servers returning
> outdated revocation data until the large-CRL signing was operational on
> 2018-11-19 21:00 UTC ?)
>

No, OCSP was not impacted. We were serving correct OCSP responses all
the time.

Kurt Roeckx

unread,
Dec 4, 2018, 4:01:32 AM12/4/18
to mozilla-dev-s...@lists.mozilla.org
On 2018-12-04 7:24, Wojciech Trapczyński wrote:
>> Question 1: Was there a period during which this issuing CA had no
>>    validly signed non-expired CRL due to this incident?
>>
>
> Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00) we
> were serving one CRL with corrupted signature.

Do you have any plans to prevent serving CRLs with an invalid signature
and keep the old CRL in place until you have a valid one?


Kurt

Wojciech Trapczyński

unread,
Dec 4, 2018, 4:25:50 AM12/4/18
to Kurt Roeckx, mozilla-dev-s...@lists.mozilla.org
On 04.12.2018 10:01, Kurt Roeckx via dev-security-policy wrote:
> On 2018-12-04 7:24, Wojciech Trapczyński wrote:
>>> Question 1: Was there a period during which this issuing CA had no
>>>    validly signed non-expired CRL due to this incident?
>>>
>>
>> Between 10.11.2018 01:05 (UTC±00:00) and 14.11.2018 07:35 (UTC±00:00)
>> we were serving one CRL with corrupted signature.
>
> Do you have any plans to prevent serving CRLs with an invalid signature
> and keep the old CRL in place until you have a valid one?

This one CRL with corrupted signature was serving between dates I
mentioned. Starting from November 14th 07:35 (UTC±00:00) we are serving
CRL with a valid signature. I have described it in the Bugzilla bug
(https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2).

Kurt Roeckx

unread,
Dec 4, 2018, 9:16:41 AM12/4/18
to mozilla-dev-s...@lists.mozilla.org
I think you misunderstood my question. I think you should never serve an
invalid file. I think it's better to have a file that is 1 or 2 days old
then it is to have an invalid file. So you could check that it's a valid
file before you start serving it, and if it's invalid keep the old file.


Kurt

Wojciech Trapczyński

unread,
Dec 4, 2018, 10:02:29 AM12/4/18
to Kurt Roeckx, mozilla-dev-s...@lists.mozilla.org
On 04.12.2018 15:16, Kurt Roeckx via dev-security-policy wrote:
> I think you misunderstood my question. I think you should never serve an
> invalid file. I think it's better to have a file that is 1 or 2 days old
> then it is to have an invalid file. So you could check that it's a valid
> file before you start serving it, and if it's invalid keep the old file.
As I mentioned in the incident report, we have deployed additional
verification of certificate and CRL signatures in the external
component, in addition to the signing module. This module blocks the
issuance of certificates and CRLs that have an invalid signature.

Ryan Sleevi

unread,
Dec 4, 2018, 1:15:06 PM12/4/18
to wtrapc...@certum.pl, mozilla-dev-security-policy
>
> Thanks for filing this, Wojciech. This is definitely one of the better
incident reports in terms of providing details and structure, while also
speaking to the steps the CA has taken in response. There was sufficient
detail here that I don't have a lot of questions - if anything, it sounds
like a number of best practices that all CAs should abide by result. The
few questions I do have are inline below:

>
On Mon, Dec 3, 2018 at 6:06 AM Wojciech Trapczyński via dev-security-policy
<dev-secur...@lists.mozilla.org> wrote:

> (All times in UTC±00:00)
>
>>
> 10.11.2018 10:10 – We received a notification from our internal
>
>> monitoring system for issuing certificates and CRLs concerning issues
> with publishing CRLs. We started verification.
>

Understanding what system you had in place before hand is valuable in
understanding what changes you propose to make. In particular, in
remediation, you note "We have deployed additional verification of
certificate and CRL signatures in the external component"

It's unclear here what the monitoring system monitored, or what the
challenges in publishing were. It sounds like there was already monitoring
in place in the internal system that detected the issue with corrupted
signatures. Is that a misunderstanding? I could also see an interpretation
being that "It was having trouble publishing large files", which would seem
a different issue.

Thus, it's helpful if you could discuss a bit more about what this
monitoring system already monitors, and how you're improving it to catch
this sort of issue. This may reveal other possible gaps, or it may be so
comprehensive as to also serve as a best practice that all CAs should
follow. In either event, the community wins :)


> 6. Explanation about how and why the mistakes were made or bugs
> introduced, and how they avoided detection until now.
>
<snip>

> All issued certificates were unusable due to corrupted signature.
>

Could you speak to more about how you assessed this? An incorrect signature
on the CRL would not necessarily prevent the certificate from being used;
it may merely prevent it from being revoked. That is, all 30,000 (revoked)
certificates may have been usable due to the corrupted signature.


> 7. List of steps your CA is taking to resolve the situation and ensure
>
>> such issuance will not be repeated in the future, accompanied with a
>
>> timeline of when your CA expects to accomplish these things.
>
>>
> We have deployed a new version of the signing module that correctly
>
>> signs large CRLs. From now, we are able to sign a CRL that is up to 128
>
>> MB. In addition, we have improved the part of the signing module
>
>> responsible for verification of signatures (at the time of failure it
>
>> did not work properly).
>
>>
> We have deployed additional verification of certificate and CRL
>
>> signatures in the external component, in addition to the signing module.
>
>> This module blocks the issuance of certificates and CRLs that have an
>
>> corrupted signature.
>
>>
> We have extended the monitoring system tests that will allow us to
> faster detection of incorrectly signed certificates or CRLs.
>

As others have highlighted, there is still an operational gap, in that 1MB
CRLs are rather large and unwieldy. To help manage this, CRLs support
"sharding", by virtue of the CRL distribution point URL and the (critical)
CRL extension of Issuing Distribution Point (
https://tools.ietf.org/html/rfc5280#section-5.2.5 ). For example, the same
(Subject DN + key) intermediate CA can divide the certificates it issues
into an arbitrary number of CRLs. It does this by ensuring distinct URLs in
the certificates' CRLDP extension, and then, for each of the URLs
referenced, hosting a CRL for all certificates bearing that URL, and with a
critical IDP extension in the CRL (ensuring the IDP is present and critical
is a critical security function).

By doing this, you can roll a new CRL for every X number of subscriber
certificates you've issued, allowing you to bound the worst-case
revocation. For example, if the average size of your CRL entry was 32 bytes
(easier for the math), then every 2,000 certificates, you could create a
new CRL URL, and the maximum size your CRL would be (in the worst case) for
those 2,000 certificates is 64K.

Have you considered such an option? Several other CAs already apply this
practice, at varying degrees of scale and size, but it seems like it would
be a further mitigation to a root cause, which is that the revocation of
30,000 certificates would not balloon things so much.

Kurt Roeckx

unread,
Dec 4, 2018, 2:08:43 PM12/4/18
to ry...@sleevi.com, wtrapc...@certum.pl, mozilla-dev-security-policy
On Tue, Dec 04, 2018 at 01:14:44PM -0500, Ryan Sleevi via dev-security-policy wrote:
>
> > All issued certificates were unusable due to corrupted signature.
> >
>
> Could you speak to more about how you assessed this? An incorrect signature
> on the CRL would not necessarily prevent the certificate from being used;
> it may merely prevent it from being revoked. That is, all 30,000 (revoked)
> certificates may have been usable due to the corrupted signature.

He explained before that the module that generated the corrupt
signature for the CRL was in a weird state after that and all
the newly issued certificates signed by that module also had
corrupt signatures.


Kurt

Ryan Sleevi

unread,
Dec 4, 2018, 3:10:39 PM12/4/18
to Kurt Roeckx, Ryan Sleevi, wtrapc...@certum.pl, mozilla-dev-security-policy
On Tue, Dec 4, 2018 at 2:08 PM Kurt Roeckx <ku...@roeckx.be> wrote:

> He explained before that the module that generated the corrupt
> signature for the CRL was in a weird state after that and all
> the newly issued certificates signed by that module also had
> corrupt signatures.
>

Ah! Thanks, I misparsed that. I agree, it does seem to be clearly addressed
:)

Wojciech Trapczyński

unread,
Dec 5, 2018, 7:53:20 AM12/5/18
to ryan....@gmail.com, mozilla-dev-security-policy
Ryan, thank you for your comment. The answers to your questions below:
There are two things here: how we monitor our infrastructure and how our
software operates.

Our system for issuing and managing certificates and CRLs has module
responsible for monitor any issue which may occur during generating
certificate or CRL. The main task of this module is to inform us that
"something went wrong" during the process of issuing certificate or CRL.
In this case we have got notification that several CRLs had not been
published. This monitoring did not inform us about corrupted signature
in one CRL. It only indicated that there are some problems with CRLs. To
identify the source of the problem human action was required.

Additionally, we have the main monitoring system with thousands of tests
of the whole infrastructure. For example, in the case of CRLs we have
tests like check HTTP status code, check downloading time, check
NextUpdate date and others. After the incident we have added tests which
allow us to quickly detect CRLs published with invalid signature (we are
using simple OpenSSL based script).

The sentence "We have deployed additional verification of certificate
and CRL signatures in the external component" applies to the changes we
have made in the software. After the incident we have added verification
of signature of the certificates and CRLs. These improvements have been
added to the software that works independently of the signing module
which was the source of the problem.

As I described in the incident report we also have improved the part of
the signing module responsible for verification of signature, because at
the time of failure it did not work properly.

>> 6. Explanation about how and why the mistakes were made or bugs
>> introduced, and how they avoided detection until now.
>>
> <snip>
>
>> All issued certificates were unusable due to corrupted signature.
>>
>
> Could you speak to more about how you assessed this? An incorrect signature
> on the CRL would not necessarily prevent the certificate from being used;
> it may merely prevent it from being revoked. That is, all 30,000 (revoked)
> certificates may have been usable due to the corrupted signature.
>
>

Kurt has explained it well. Kurt, thank you.
Thank you for pointed that out. We have not considered it yet, but it
seems to be a good solution for such cases. We would have to estimate
what changes would be required to implement this.

Ryan Sleevi

unread,
Dec 5, 2018, 3:27:02 PM12/5/18
to wtrapc...@certum.pl, mozilla-dev-security-policy
On Wed, Dec 5, 2018 at 7:53 AM Wojciech Trapczyński <wtrapc...@certum.pl>
wrote:

> Ryan, thank you for your comment. The answers to your questions below:
>

Again, thank you for filing a good post-mortem.

I want to call out a number of positive things here rather explicitly, so
that it hopefully can serve as a future illustration from CAs:
* The timestamp included the times, as requested and required, which help
provide a picture as to how responsive the CA is
* It includes the details about the steps the CA actively took during the
investigation (e.g. within 1 hour, 50 minutes, the initial cause had been
identified)
* It demonstrates an approach that triages (10.11.2018 12:00), mitigates
(10.11.2018 18:00), and then further investigates (11.11.2018 07:30) the
holistic system. Short-term steps are taken (11.11.2018 19:30), followed by
longer term steps (19.11.2018)
* It provides rather detailed data about the problem, how the problem was
triggered, the scope of the impact, why it was possible, and what steps are
being taken.

That said, I can't say positive things without highlighting opportunities
for improvement:
* It appears you were aware of the issue beginning on 10.11.2018, but the
notification to the community was not until 03.12.2018 - that's a
significant gap. I see Wayne already raised it in
https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c1 and that has been
responded to in https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2
* It appears, based on that bug and related discussion (
https://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2 ), that from
10.11.2018 01:05 (UTC±00:00) and 14.10.2018 07:35 (UTC±00:00) an invalid
CRL was being served. That seems relevant for the timeline, as it speaks to
the period of CRL non-compliance. In this regard, I think we're talking
about two different BR "violations" that share the same incident root cause
- a set of invalid certificates being published and a set of invalid CRLs
being published. Of these two, the latter is far more impactful than the
former, but it's unclear based on the report if the report was being made
for the former (certificates) rather than the latter (CRLs)

Beyond that, a few selected remarks below.


> There are two things here: how we monitor our infrastructure and how our
> software operates.
>
> Our system for issuing and managing certificates and CRLs has module
> responsible for monitor any issue which may occur during generating
> certificate or CRL. The main task of this module is to inform us that
> "something went wrong" during the process of issuing certificate or CRL.
> In this case we have got notification that several CRLs had not been
> published. This monitoring did not inform us about corrupted signature
> in one CRL. It only indicated that there are some problems with CRLs. To
> identify the source of the problem human action was required.
>

Based on your timeline, it appears the issue was introduced at 10.11.2018
01:05 and not alerted on until 10.11.2018 10:10. Is that correct? If so,
can you speak to why the delay between the issue and notification, and what
the target delay is with the improvements you're making? Understanding that
alerting is finding a balance between signal and noise, it does seem like a
rather large gap. It may be that this gap is reflective of 'on-call' or
'business hours', it may be a threshold in the number of failures, it may
have been some other cause, etc. Understanding a bit more can help here.


> Additionally, we have the main monitoring system with thousands of tests
> of the whole infrastructure. For example, in the case of CRLs we have
> tests like check HTTP status code, check downloading time, check
> NextUpdate date and others. After the incident we have added tests which
> allow us to quickly detect CRLs published with invalid signature (we are
> using simple OpenSSL based script).
>

So, this is an example of a good response. It includes a statement that
requires trust ("we have ... thousands of tests"), but then provides
examples that demonstrate an understanding and awareness of the potential
issues.

Separate from the incident report, I think publishing or providing details
about these tests could be a huge benefit to the community, with an ideal
outcome of codifying them all as requirements that ALL CAs should perform.
This is where we go from "minimum required" to "best practice", and it
sounds like y'all are operating at a level that seeks to capture the spirit
and intent, and not just the letter, and that's the kind of ideal
requirement to codify and capture.


> As I described in the incident report we also have improved the part of
> the signing module responsible for verification of signature, because at
> the time of failure it did not work properly.
>

This is an area where I think more detail could help. Understanding what
caused it to "not work properly" seems useful in understanding the issues
and how to mitigate. For example, it could be that "it did not work
properly" because "it was never configured to be enabled", it could be that
"it did not work properly" because "a bug was introduced and the code is
not tested", or.. really, any sort of explanation. Understanding why it
didn't work and how it's been improved helps everyone understand and,
hopefully, operationalize best practices.

Wojciech Trapczyński

unread,
Dec 10, 2018, 7:42:35 AM12/10/18
to ryan....@gmail.com, mozilla-dev-security-policy
> responded to inhttps://bugzilla.mozilla.org/show_bug.cgi?id=1511459#c2
Yes, that is correct. This monitoring system that we are using in our
software for issuing and managing certificates and CRLs has not
notification feature. The requirement of reviewing events from it is a
part of the procedure. In the other words, to detect any issue in this
monitoring the human action is required. That is why we detected this
issue with some delay.

Therefore, we have added tests to our main monitoring system and we
receive notification in less than 5 minutes since the occurrence of the
event.

>> Additionally, we have the main monitoring system with thousands of tests
>> of the whole infrastructure. For example, in the case of CRLs we have
>> tests like check HTTP status code, check downloading time, check
>> NextUpdate date and others. After the incident we have added tests which
>> allow us to quickly detect CRLs published with invalid signature (we are
>> using simple OpenSSL based script).
>>
> So, this is an example of a good response. It includes a statement that
> requires trust ("we have ... thousands of tests"), but then provides
> examples that demonstrate an understanding and awareness of the potential
> issues.
>
> Separate from the incident report, I think publishing or providing details
> about these tests could be a huge benefit to the community, with an ideal
> outcome of codifying them all as requirements that ALL CAs should perform.
> This is where we go from "minimum required" to "best practice", and it
> sounds like y'all are operating at a level that seeks to capture the spirit
> and intent, and not just the letter, and that's the kind of ideal
> requirement to codify and capture.
>
>

We are using Zabbix – The Enterprise-Class Open Source Network
Monitoring Solution (https://www.zabbix.com/).

For ordinary tests we are using functions build-in Zabbix, for example:

- "Simple checks" for monitor things like ICMP ping, TCP/UDP service
availability;
- "Web scenarios" for monitor things like HTTP response status code,
download speed, response time.

For uncommon tests that every CAs have to deal with we are using our own
scripts that we embedded in Zabbix. For all tests we have defined sets
of triggers that trigger appropriate actions.

Of course, we are willing to share details of our tests as a part of
creating the best practices that all CAs should follow. I guess that a
lot of CAs have similar tests in their infrastructure and sharing it
will be valuable for all.

>> As I described in the incident report we also have improved the part of
>> the signing module responsible for verification of signature, because at
>> the time of failure it did not work properly.
>>
> This is an area where I think more detail could help. Understanding what
> caused it to "not work properly" seems useful in understanding the issues
> and how to mitigate. For example, it could be that "it did not work
> properly" because "it was never configured to be enabled", it could be that
> "it did not work properly" because "a bug was introduced and the code is
> not tested", or.. really, any sort of explanation. Understanding why it
> didn't work and how it's been improved helps everyone understand and,
> hopefully, operationalize best practices.
>

The technical issue was in wrong calculation of the hash of the object.
Unfortunately, this wrong calculation of hash was using during the
verification of the signature as well. Therefore, the corrupted
signatures were not detecting. As I described in the incident report,
the tests of this software did not contain creation of the signature for
such large CRL and for that reason it avoided detection until now.

The first thing we fixed was the signing module itself. We have made
changes that allow us to correctly sign the large objects and verify its
signatures in the correct way. Then, to eliminate risk we have decided
to add a signature verification in another component of our system. It
gives us certainty that even if the signing module fails once again, we
do not repeat the same mistake and all invalid certificates or CRLS will
be blocked.

0 new messages