Hi Steven.
AND NOT EXISTS (
SELECT 1
FROM certificate c2
WHERE x509_serialNumber(c2.CERTIFICATE) = x509_serialNumber(cai.CERTIFICATE)
AND c2.ISSUER_CA_ID = cai.ISSUER_CA_ID
AND c2.ID < cai.CERTIFICATE_ID
AND x509_tbscert_strip_ct_ext(c2.CERTIFICATE) = x509_tbscert_strip_ct_ext(cai.CERTIFICATE)
LIMIT 1
)
It ensures that everything except the lowest crt.sh ID in the set (of precertificate and/or corresponding certificate) is filtered out.
Since crt.sh IDs are allocated incrementally, and since precertificates exist before their corresponding certificates, it's not surprising (although it's certainly not guaranteed) that a precertificate will tend to have a lower crt.sh ID than the corresponding certificate.
In my opinion, yes. Returning the first match gives a more predictable result, because you never know if/when a further match might become known to crt.sh. I've found this predictability to be useful for some use cases.
You could change the "<" for ">" in that part of the query to make it always return the highest crt.sh ID. To make the query always prefer a certificate (if known to the log) over the corresponding precertificate would require more coding effort.