Can Crt.sh query interfaces support non-english character ?

222 views
Skip to first unread message

Ethan Wang

unread,
Mar 30, 2021, 11:12:05 PM3/30/21
to crt.sh
I used following interface to query list of certificates used by an oragnization,
unfortunately, this oragnization use chinese character and an empty list is returned.

Is there any workaround? Thanks

in...@sql-und-xml.de

unread,
Mar 31, 2021, 2:01:34 AM3/31/21
to crt.sh
You have to use the ASCII-version of the domain name. Something with xn--. That should always work.

Ethan Wang

unread,
Mar 31, 2021, 6:12:30 AM3/31/21
to crt.sh
I was query the interface by organization name, not  the domain name, so it doesn't  work for my case,

r...@sectigo.com

unread,
Mar 31, 2021, 8:12:29 AM3/31/21
to crt.sh
Hi Ethan.  It looks like this is a limitation of the current implementation.

The certificate identities Full Text Search index uses the identities() function, which converts each raw Subject/SAN identity into text using decode(<rawidentity>, 'escape') - see https://github.com/crtsh/certwatch_db/blob/master/fnc/identities.fnc#L36.  Per the postgres documentation, "escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes." (https://www.postgresql.org/docs/12/functions-string.html).

Query: select identities(certificate) from certificate where id=2936539755;
Result:  '002' '012' '022' '072' '102' '112' '122' '132' '142' '153' '200' '201' '205' '210' '211' '212' '217' '220' '221' '224' '231' '234' '241' '247' '254' '257' '265' '267' '270' '277' '344' '345' '346' '34
7' '351' '422' '432' '443' '452' '502' '543' '562' '643' '712' '742' '743' '752' '762' '772' 'ca' 'cn' 'email' 'nc' 'secure'

It looks like changing decode(<rawidentity>, 'escape') to convert_from(<rawidentity, 'UTF8') is what's required.  Here's a quick test to prove the point...
Query: select identities_convert_from(certificate) from certificate where id=2936539755;
Result: 'ca' 'cn' 'email' 'nc' 'secure' '上海锐成信息科技有限公司' '司公限有技科息信成锐海上'

Although changing this function is trivial, it would also be necessary to rebuild the Full Text Search index afterwards, which would probably take several days and could cause service disruption.  I'll make a note to do this at some point in the future, although I can't promise when this might happen.

Ethan Wang

unread,
Mar 31, 2021, 9:53:26 PM3/31/21
to crt.sh

Thanks for your reply, your explanation is very clear, I really appreciate it.
Hope this limitation can be fixed in the near future~~

nao. uesugi

unread,
Aug 2, 2023, 5:18:33 AM8/2/23
to crt.sh
Hello

Is this feature not implemented yet?
I don't think it's implemented yet as far as I'm trying it at hand.

2021年4月1日木曜日 10:53:26 UTC+9 Ethan Wang:

r...@sectigo.com

unread,
Aug 18, 2023, 7:08:04 AM8/18/23
to crt.sh
Hi.  Sorry, this still has not been implemented.  Thanks for the reminder though.

r...@sectigo.com

unread,
Nov 16, 2023, 10:32:40 AM11/16/23
to crt.sh
> Although changing this function is trivial, it would also be necessary to rebuild the Full Text Search index afterwards, which would probably take several days and could cause service disruption.  I'll make a note to do this at some point in the future, although I can't promise when this might happen.

Due to recent performance improvements (to both the software and hardware that power crt.sh), I'm now a lot less concerned about the potential for service disruption.

So today I've changed the identities() function as described earlier in this thread, and I've begun to REBUILD CONCURRENTLY the associated indexes.

I'll post a follow-up message when the index rebuilds have all completed.
Reply all
Reply to author
Forward
0 new messages