Can Crt.sh query interfaces support non-english character ?

395 views
Skip to first unread message

Ethan Wang

unread,
Mar 30, 2021, 11:12:05 PM3/30/21
to crt.sh
I used following interface to query list of certificates used by an oragnization,
unfortunately, this oragnization use chinese character and an empty list is returned.

Is there any workaround? Thanks

in...@sql-und-xml.de

unread,
Mar 31, 2021, 2:01:34 AM3/31/21
to crt.sh
You have to use the ASCII-version of the domain name. Something with xn--. That should always work.

Ethan Wang

unread,
Mar 31, 2021, 6:12:30 AM3/31/21
to crt.sh
I was query the interface by organization name, not  the domain name, so it doesn't  work for my case,

r...@sectigo.com

unread,
Mar 31, 2021, 8:12:29 AM3/31/21
to crt.sh
Hi Ethan.  It looks like this is a limitation of the current implementation.

The certificate identities Full Text Search index uses the identities() function, which converts each raw Subject/SAN identity into text using decode(<rawidentity>, 'escape') - see https://github.com/crtsh/certwatch_db/blob/master/fnc/identities.fnc#L36.  Per the postgres documentation, "escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes." (https://www.postgresql.org/docs/12/functions-string.html).

Query: select identities(certificate) from certificate where id=2936539755;
Result:  '002' '012' '022' '072' '102' '112' '122' '132' '142' '153' '200' '201' '205' '210' '211' '212' '217' '220' '221' '224' '231' '234' '241' '247' '254' '257' '265' '267' '270' '277' '344' '345' '346' '34
7' '351' '422' '432' '443' '452' '502' '543' '562' '643' '712' '742' '743' '752' '762' '772' 'ca' 'cn' 'email' 'nc' 'secure'

It looks like changing decode(<rawidentity>, 'escape') to convert_from(<rawidentity, 'UTF8') is what's required.  Here's a quick test to prove the point...
Query: select identities_convert_from(certificate) from certificate where id=2936539755;
Result: 'ca' 'cn' 'email' 'nc' 'secure' '上海锐成信息科技有限公司' '司公限有技科息信成锐海上'

Although changing this function is trivial, it would also be necessary to rebuild the Full Text Search index afterwards, which would probably take several days and could cause service disruption.  I'll make a note to do this at some point in the future, although I can't promise when this might happen.

Ethan Wang

unread,
Mar 31, 2021, 9:53:26 PM3/31/21
to crt.sh

Thanks for your reply, your explanation is very clear, I really appreciate it.
Hope this limitation can be fixed in the near future~~

nao. uesugi

unread,
Aug 2, 2023, 5:18:33 AM8/2/23
to crt.sh
Hello

Is this feature not implemented yet?
I don't think it's implemented yet as far as I'm trying it at hand.

2021年4月1日木曜日 10:53:26 UTC+9 Ethan Wang:

r...@sectigo.com

unread,
Aug 18, 2023, 7:08:04 AM8/18/23
to crt.sh
Hi.  Sorry, this still has not been implemented.  Thanks for the reminder though.

r...@sectigo.com

unread,
Nov 16, 2023, 10:32:40 AM11/16/23
to crt.sh
> Although changing this function is trivial, it would also be necessary to rebuild the Full Text Search index afterwards, which would probably take several days and could cause service disruption.  I'll make a note to do this at some point in the future, although I can't promise when this might happen.

Due to recent performance improvements (to both the software and hardware that power crt.sh), I'm now a lot less concerned about the potential for service disruption.

So today I've changed the identities() function as described earlier in this thread, and I've begun to REBUILD CONCURRENTLY the associated indexes.

I'll post a follow-up message when the index rebuilds have all completed.

pieter hartel

unread,
Nov 18, 2024, 2:34:13 PM11/18/24
to crt.sh
Has this feature been implemented please?
Or is there a workaround using a Postgres query?
--pieter

r...@sectigo.com

unread,
Nov 26, 2024, 10:41:54 AM11/26/24
to crt.sh
Hi Pieter.  Apologies for not following up on this thread previously.

> Due to recent performance improvements (to both the software and hardware that power crt.sh), I'm now a lot less concerned about the potential for service disruption.

It turned out that my lack of concern was misplaced.  I was able to run REINDEX INDEX CONCURRENTLY on some of the smaller GIN indexes on the "certificate" table, but it proved to be impossible to do this for the 2018 to 2024 indexes.  (IIRC, attempting to reindex these larger GIN indexes had a huge negative impact on system performance, so I had to give up).

pieter hartel

unread,
Nov 26, 2024, 10:50:51 AM11/26/24
to crt.sh
Dear Rob,

Thanks for your reply. In understand that it cannot be fixed for now, but I think that there is a workaround.
Please correct me if I'm wrong, but I think that in queries, replacing the non-ascii characters with a % should work, with the LIKE and ILIKE operators.

--pieter

r...@sectigo.com

unread,
Nov 28, 2024, 3:41:10 PM11/28/24
to crt.sh
Hi Pieter.  Whilst that makes sense from an SQL language point of view, unfortunately there are no indexes on the "certificate" table to support LIKE/ILIKE queries.  So whilst LIKE/ILIKE are useful for filtering a result set, currently the function-based GIN index (c_identities) must also be used in order to produce a result set that's small enough for filtering.  Without that, LIKE/ILIKE queries will do a full table scan on the "certificate" table.
Reply all
Reply to author
Forward
0 new messages