Hi
My organisation has been (trying to) use queries derived from crt.sh code to try to compute how many certs we've issued against a particular domain so we can alert when we approach Let's Encrypt rate limits.
This code:
AND NOT EXISTS (
SELECT 1
FROM certificate c2
WHERE x509_serialNumber(c2.CERTIFICATE) = x509_serialNumber(cai.CERTIFICATE)
AND c2.ISSUER_CA_ID = cai.ISSUER_CA_ID
AND c2.ID < cai.CERTIFICATE_ID
AND x509_tbscert_strip_ct_ext(c2.CERTIFICATE) = x509_tbscert_strip_ct_ext(cai.CERTIFICATE)
LIMIT 1
)
appears to try to detect pre-certificates by finding certs with the same serial and extension-stripped payload. But this method is very inefficient because of the use of a correlated subquery and a CPU-expensive comparison of a large blob of bytea data that must be extracted from each candidate certificate for comparison.
The correlated subquery has a LIMIT clause which prevents postgres from optimising it into a left-anti-join.
It can be expressed MUCH more efficiently as a left anti-join like
LEFT JOIN certificate c2 ON (
x509_serialNumber(c2.CERTIFICATE) = x509_serialNumber(cai.CERTIFICATE)'
AND c2.ISSUER_CA_ID = cai.ISSUER_CA_ID'
AND c2.ID < cai.CERTIFICATE_ID'
AND x509_tbscert_strip_ct_ext(c2.CERTIFICATE) = x509_tbscert_strip_ct_ext
)
-- Only retain rows of c1 where no corresponding c2 was found
but for this case there's an even easier and more efficient way that avoids the need to join entirely. The query can filter-out pre-certificates with the following SQL term that checks for the pre-certificate x.509 critical extension that "poision"s them for clients, and discards the certs if the extension is present:
AND NOT x509_hasExtension(CERTIFICATE, '1.3.6.1.4.1.11129.2.4.3', true)
This can be made cleaner and more readable (IMO) if expressed as
LEFT JOIN LATERAL x509_hasExtension(CERTIFICATE, '1.3.6.1.4.1.11129.2.4.3', true) AS precert_status(is_precertificate)
...
other sql here
...
WHERE
...
AND NOT is_precertificate
so that you can also easily surface an "is_precertificate" flag in the SELECT output clause for other analytical needs.
It'd also be helpful if a comment in the code there explicitly stated that the dedup in question is to exclude pre-certs, it's _not_ to detect and exclude cert renewals, as I was confused by that for a while. I now understand that at least for Lets Encrypt there are never actually any true renewals, every cert is a new cert issued for the same FQDN set (and even if there were renewals they'd have a different serial).